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CHAPTER  1 


INTRODUCTION 


The  basic  concept  in  designing  parallel  supercomputers  is  to  employ  an  interconnection 
to  interconnect  a  large  number  of  processors  and  memory  modules  such  that  memory  modules 
can  be  accessed  in  parallel  processors  can  communicate  with  each  other  with  serious  com¬ 
munication  conflict  or  traffic  congestion.  Since  the  trend  :n  designing  supercomputers  is  to 
use  off-the-shelf  products  for  processors  and  memory  modules,  the  bottleneck  is  the  intercon¬ 
nection  network.  Specifically,  we  define  an  interconnection  network  to  be  a  connection  of 
switches  and  links  that  allows  data  communication  between  processors  and/or  memor\' 
modules  in  a  system  consisting  of  multiple  processors.  Many  factors  are  involved  in  deter¬ 
mining  the  cost-effectiveness  of  a  particular  network  design,  including  the  computational  tasks 
it  will  be  used  for,  the  desired  speed  of  interprocessor  data  transfers,  the  actual  hardware 
implementation  of  the  network,  the  number  of  processors  in  the  system,  and  cost  constraints 
on  the  construction  [Bhu87].  Interconnection  networks  can  be  classified  into  two  categories 
based  on  network  topologies:  static  networks  and  dynamic  networks  [Hwa87].  Examples  of 
static  networks  are  linear  array,  ring,  star,  tree,  mesh,  systolic  array,  chordal  ring,  hypercube. 
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and  cube-connected-cycle.  Examples  of  dynamic  networks  are  single-stage,  crossbar,  and 
multistage  interconnection  networks.  The  two  major  switching  methodologies  lor  interconnec¬ 
tion  networks  are  circuit  switching  and  packet  switching .  In  circuit  switching,  a  physical 
path  is  actually  established  between  a  source  and  a  destination.  In  packet  switching,  data 
packets  are  routed  through  the  interconnection  network  without  establishing  a  physical  con¬ 
nection  path.  In  general,  circuit  switching  is  much  more  suitable  for  bulk  data  transmission, 
and  packet  switching  is  more  efficient  for  short  data  messages. 

In  this  repon,  we  will  focus  on  the  domain  of  multistage  interconnection  networks  since 
they  offer  a  flexible  environment  to  meet  real-time  processing  requirements  for  either  mul¬ 
tiprocessor  or  share-memory  systems.  The  functionalities,  topological  relationship,  and  fault 
tolerance  of  various  multistage  interconnection  networks  are  discussed  in  this  report.  The  rest 
of  this  repon  is  organized  as  the  following  five  chapters. 

In  Chapter  2.  two  characteristic  functions  O  and  I  are  introduced  to  uniquely  describe 
any  Multistage  Interconnection  Networks  (MINs)  constructed  by  general  shuffle  connections. 
All  the  MINs  constructed  by  general  shuffle  connections  are  shown  to  be  in  a  class  of 
equivalent  Banyan  type  MINs,  named  as  the  log2N-stage  Column- Permute  interconnection 
network  flog2iV  CP^).  Based  on  these  two  characteristic  functions,  we  show  that  routing 
algorithms,  network  construction  rules,  network  equivalence  propenies,  and  network  transfor¬ 
mation  rules  can  be  directly  established.  As  the  design  of  reconfigurable  systems  is  con¬ 
sidered.  we  show  that  the  equivalence  among  networks  can  easily  be  described  by  linear 
transformations  on  characteristic  functions.  In  other  words,  the  equivalence  can  be  interpreted 
as  a  renaming  scheme  on  the  inputs  and  outputs  of  a  network.  We  explain  why  the  routing 
scheme  on  each  network  is  always  destination-oriented  and  source-preserved. 
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In  Chapter  3,  the  CP^  networks  in  Chapter  2  are  extented  to  a  class  of  Bit-Permute- 
Complement  (BPC)  type  multistage  interconnection  networks.  This  class  of  BPC  networks  is 
based  on  a  more  general  interconnection  model  than  that  of  CP^.  Typically,  they  possess  a 
very  simple  routing  scheme  such  that  for  any  communication  path  the  source  address  can  be 
easily  preserved  and  the  destination  can  be  used  as  routing  tags.  General  rules  for  transform¬ 
ing  a  multistage  interconnection  network  into  another  in  this  class  by  simply  renumbering  the 
inputs  and  outputs  of  the  network  are  presented.  Both  distributed  and  global  routing  schemes 
after  the  transformation  are  discussed.  By  using  the  proposed  network  transformation  rules, 
algorithms  developed  on  a  machine  using  a  multistage  interconnection  network  can  be  directly 
used  on  another  machine  which  employs  a  different  network. 

In  Chapter  4,  the  permutation  capability  of  the  class  of  BPC  multistage  interconnection 
networks  defined  by  a  general  model  using  bit-permute-complement  connections,  which 
includes  the  Omega  network,  baseline  network.  Indirect  binary  cube  network,  etc.  as  special 
cases,  is  studied.  In  this  chapter,  several  questions  are  addressed.  How  can  we  easily  charac¬ 
terize  all  the  admissible  permutations  of  a  network?  How  can  we  determine  whether  or  not  a 
permutation  is  admissible  on  a  network?  We  start  our  discussion  on  Omega  networks  due  to 
their  regular  structure,  and  then  generalize  the  problem  to  the  general  model.  We  show  that 
the  set  of  admissible  permutations  of  a  network  can  be  characterized  by  very  simple  bit  rela¬ 
tions  depending  on  two  characteristic  functions  which  specify  this  network.  The  time  com¬ 
plexity  of  our  proposed  algorithm  to  determine  the  admissibility  of  a  permutation  on  a  multis¬ 
tage  interconnection  network  is  OiN),  where  N  is  the  number  of  inputs/outputs  of  the  net¬ 


work. 


4 


In  Chapter  5,  the  fault  tolerance  capability  of  a  multistage  interconnection  network  and 
the  technique  to  reconfigure  a  network  under  multiple  faults  are  discussed.  Both  faulty 
switching  elements  and  faulty  communication  links  are  considered  in  our  fault  model.  Gen¬ 
erally  speaking,  in  a  multistage  interconnection  network  used  for  interprocessor  connections,  if 
faults  occur,  many  input-output  communication  paths  will  no  longer  be  available.  A  solution 
to  this  is  to  allow  data  propagation  through  the  faulty  network  multiple  passes,  such  that  rea¬ 
sonable  communication  capability  can  be  maintained.  In  this  chapter,  a  fault-tolerant 
reconfiguration  scheme  is  developed  for  an  iV -processor  system  interconnected  by  a  log2iV- 
stage  Omega  network  under  multiple  faults.  Regardless  of  whether  the  faults  on  the  Omega 
network  are  critical  or  not,  a  deadlock-free  environment  is  provided  for  the  /^-processor  sys¬ 
tem  by  applying  our  reconfiguration  scheme.  The  reconfiguration  of  such  a  multiprocessor 
system  is  based  on  three  principles:  disable  processors  whose  communication  capabilities  are 
completely  destroyed,  eliminate  faulty  components  and,  if  necessary,  sacrifice  some  usable 
components  implicitly  without  knowing  the  actual  locations  of  these  components.  The 
reconfigured  system  is  a  surviving  system  such  that  it  may  be  an  intergrated  one  consisting  of 
a  subset  of  the  N  processors  (including  the  case  of  all  the  N  processors)  or  it  may  be  pani- 
tioned  into  a  number  of  subsystems  such  that  the  dynamic  full  access  property  within  each 
subsystem  (system)  can  be  maintained.  A  deadlock-free  shonest-path  routing  table  is  obtained 
for  each  processor  in  the  surviving  system  (subsystem)  to  avoid  the  danger  of  deadlock  traps 
caused  by  uncautiousiy  using  unidirectional  communication  paths  rather  than  bidirectional 
communication  paths  between  some  processors.  The  time  complexity  of  our  reconfiguration 
scheme  is  analyzed  as  well. 
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CHAPTER  2 


CHARACTERIZATION  OF  MULTISTAGE 
INTERCONNECTION  NETWORKS 


2.1.  INTRODUCTION 

The  use  of  Multistage  Interconnection  Networks  (MINs)  is  considered  a  very  cost- 
effective  means  to  provide  efficient  interprocessor  and/or  processor-memory  communication 
[FenSl]  [WuFeSO]  [Sie79].  Examples  are  the  Omega  network  [Law75],  baseline  and  reverse 
baseline  network  [WuFeSO],  indirect  binary  cube  network  [Pea77],  Delta  network  [PatSl], 
cube  network  [SiSm781,  flip  network  [Bat76]  and  modified  data  manipulator  [Fen74].  It  is 
well  known  that  all  those  MINs  are  constructed  through  a  particular  type  of  connections  called 
general  shuffle  connections.  It  is  clear  that,  in  addition  to  those  seven  MINs,  there  exists  a 
huge  set  of  MINs  which  are  also  constructed  by  general  shuffle  connections.  It  is  also  well 
known  that  the  capability  of  each  MIN  in  terms  of  non-conflicting  communication  permuta¬ 
tions  is  different.  For  example,  it  has  been  reponed  in  [NaSaSl]  that  an  Omega  network  can 
realize  cyclic  shifts,  p  -ordering,  inverse  p  -ordering,  etc.  and  cannot  realize  operations  such  as 
bit-reversal.  Since  each  application  will  require  a  different  set  of  permutations  in  order  to 
optimally  perform  the  execution,  it  is  extremely  important  for  a  system  designer  to  be  able  to 
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identify  a  MIN  which  is  best  suited  for  applications  needs.  However,  due  to  the  lack  of  a 
general  understanding  of  network  characteristics,  so  far,  a  designer  has  to  select  a  MIN  out  of 
the  seven  known  MINs  (instead  of  out  of  millions  of  MENs)  in  a  rather  ad  hoc  manner.  This 
has  gready  limited  the  achievable  performance  of  a  parallel  supercomputer.  Since  it  is  impos¬ 
sible  to  investigate  the  millions  of  MINs  one  by  one,  it  is  essential  to  be  able  to  characterize 
those  MINs  such  that  a  precise  quantitative  description  of  them  will  then  be  available  and  a 
.MEN  can  be  immediately  investigated  as  soon  as  the  characteristic  functions  are  available. 
Since  the  characteristic  functions  of  this  MIN  will  carry  information  regarding  the  permutation 
capability  of  this  MIN,  it  will  then  be  possible  for  a  designer  to  select  a  MEM  out  of  millions 
of  MINs  to  optimally  meet  application  needs.  In  this  chapter,  we  will  show  how  to  character¬ 
ize  the  whole  set  of  MINs  which  are  constructed  by  general  shuffle  connections. 

Many  theoretical  properties  of  the  above  seven  MINs  have  been  discussed,  which  are 
summized  as  follows.  Their  inputs  and  outputs  are  fully  connected  and  a  simple  routing 
scheme  can  be  applied  from  any  input  to  any  output  [FenSl]  [WuFeSO]  [Sie79]  fLaw75] 
[Pea77]  [PatSl]  [SiSm78]  [Bat76]  [Fen74).  They  have  the  buddy  property  [WuFeSO] 
[Agr83]  [AgSw88]  and  belong  to  bidelta  networks  [KrSn86].  The  topological  equivalence 
problem  among  them  has  been  exhibited  based  on  the  fact  that  their  graph  representations  are 
isomorphic  to  the  Banyan  graph  [GoLi78];  that  is,  they  are  all  equivalent  Banyan  type  .MINs. 
In  [BeFo88],  Bermond  and  Foumeau  discussed  some  Banyan  type  MINs  with 
independent  connections  and  showed  an  approach  to  explain  their  topological  equivalence. 
In  [KrSn86],  Huang  and  Tripathi  have  shown  that  a  MIN  can  be  represented  by  a  finite  state 
machine.  However,  other  than  issues  of  theoretical  interest,  these  works  still  tell  us  little 
about  how  to  help  designers  to  investigate  or  understand  the  capabilities  of  the  other  millions 
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of  MINs.  Therefore,  their  contribution  towards  the  practical  application  needs  which  we  men¬ 
tioned  earlier  are  rather  limited.  This  again  shows  that,  in  addition  to  the  discussion  of  com¬ 
mon  topological  properties,  it  is  extremely  imponant  to  be  able  to  characterize  MINs  and 
directly  discuss  the  network  capability,  routing  algorithms,  and  construction  rules  on  the 
characteristic  functions.  We  believe  that  this  is  the  staning  point  for  system  designers  to  fully 
exploit  the  capabilities  of  MINs  and  really  take  maximal  advantage  of  them. 

In  this  chapter,  our  concern  is  to  give  a  global  view  of  the  characteristics  of  various 
MINs.  First  of  all,  the  topological  structure  of  the  whole  class  of  log2iV-stage  Banyan  type 
MINs  constructed  by  general  shuffle  connections  (which  are  called  column  permute  connec¬ 
tions  in  this  chapter)  is  defined  in  detail,  which  includes  all  the  seven  mentioned  MINs  as  spe¬ 
cial  cases.  We  denote  them  as  log2N  CP^.  A  path -descriptive  methodology  is  used  to 
characterize  topological  features  on  the  whole  class  and  we  show  that  two  imponant  permuta¬ 
tion  functions,  O  and  I,  can  uniquely  specify  each  MIN  in  the  class.  We  call  these  two 
characteristic  functions.  Based  on  functions  O  and  I,  many  imponant  features  of  MINs 
including  routing  algorithms,  network  construction,  permutation  capability,  etc.  can  be 
immediately  established  or  examined.  As  the  topological  equivalence  is  considered,  we  show 
that  the  equivalent  relation  among  MINs  can  be  subdivided  and  is  an  intrinsically  linear 
transformation  on  functions  O  and  I.  In  other  words,  the  transfer  from  one  MIN  to  another 
can  be  viewed  as  a  renaming  scheme  on  the  inputs  and  outputs  of  a  MIN.  It  has  a  significant 
impact  on  the  design  of  a  reconfigurable  system.  As  communication  is  considered,  we  show 
that  the  routing  scheme  for  each  .MIN  in  the  class  can  be  directly  derived  from  the  charac¬ 
teristic  functions  O  and  I.  We  explain  why  the  distributed  routing  algorithm  for  each  .MIN  is 
destination -oriented  (i.e.,  its  routing  tags  can  be  determined  by  the  destination  addresses 
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only)  and  source -preserved  (i.e.,  the  address  of  the  source  input  can  automatically  be 
preserved  without  extra  efforts).  Note  that,  in  this  chapter,  since  we  limit  our  discussion  to 
log2/V -stage  MINs  only,  log2N  CP^  and  CP^  are  used  interchangeably. 

The  rest  of  this  chapter  is  organized  as  follows.  In  Section  2.2,  we  introduce  a  class  of 
basic  connection  patterns  defined  by  CP(rt )  permutations.  Section  2.3  is  devoted  to  the  topo¬ 
logical  structure  of  the  CP^  and  the  outline  of  two  important  characteristic  functions,  O  and 
I.  We  show  how  they  are  related  to  the  equivalence  and  transformation  among  MINs.  In 
Section  2.4,  we  show  the  relation  between  general  routing  algorithms  and  the  characteristic 
functions.  In  Section  2.5,  the  decomposition  and  partitioning  of  various  networks  are  dis¬ 
cussed.  Finally,  Section  2.6  concludes  this  chapter. 

2.2.  BASIC  CONNECTION  PATTERN 

In  order  to  describe  a  network  in  terms  of  permutations  on  N  symbols,  we  may  label  the 
links  of  this  network  at  the  inputs  and  outputs  of  all  switching  elements  following  their 
natural  order  in  the  drawing.  A  label  is  a  number  between  0  and  /V-1  whose  binary  represen¬ 
tation  (i.e.,  address)  can  be  denoted  by  . where  Xq  is  the  least  significant  bit 

(LSB).  Each  connection  link  is  defined  by  two  labels  and  each  connection  pattern  between 
two  adjacent  switching  stages  is  specified  by  a  permutation  of  N  labels. 

An  n -stage  MIN  constructed  with  /i-f-I  connection  patterns  has  often  been  defined  using 
these  premutations  [ParSO].  For  example,  the  Omega  network  is  defined  as  n  consecutive 
perfect  shuffle  permutations  plus  an  identity  permutation  for  connecting  outputs  of  the  net¬ 
work.  A  perfect  shuffle  permutation  is  defined  as  a  circular  left  shift  of  the  binary  repre.senta- 


tion  of  an  operand. 
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Note  that  not  every  arbitrarily  selected  «+l  connection  patterns  can  be  used  to  design  a 
MIN  with  the  desired  properties.  For  example,  any  disturbance  in  the  order  of  n  +  l  connec¬ 
tion  patterns  on  the  Omega  network  may  result  in  a  new  network  which  will  no  longer 
preserve  the  full  connectivity  property.  In  other  words,  the  topological  structure  of  a  MIN  is 
closely  related  to  its  functional  behavior  and  its  construction  should  be  based  on  a  systematic 
method  for  selecting  a  sequence  of  valid  connection  patterns. 

In  this  section,  we  introduce  a  class  of  basic  connection  patterns.  Using  this  set  of  per- 
muiauons,  we  can  define  construction  rules  and  describe  the  functional  behavior  of  variouj 
MIN’S. 

Consider  numbers  from  0  to  N -I  and  the  binary  representation  of  each  of  them.  X  = 
We  define  the  class  CP(n)  of  Column-Permute  premutation  functions  by  a 
permutation  on  indices  of  the  representation. 

DEFINITION  1:  A  CP  permutation  P  in  CP(«)  is  specified  by  an  n -tuple  vector  V’  = 

(0(«-l),  .  .  .  ,9(1), 0(0)),  where  9  is  a  permutation  function  on  (^-1 . 1.0)  and 

(0(^-1),  .  .  .  ,0(1), 9(0))  is  the  image  of  0.  The  mapping  of  P  on  X  =  (.r,_i, . ’^!-V(}) 

fO,  .V-l]  is  obtained  as  follows: 

P  e  CP(7i)  {N  =  2") 

P(x^_i,  .  .  .  ,XxJCq)  =  (P(jc„^i),  .  .  ,P(jri),P(jto)) 

=  Cv„-i . yiJo)- 

It  is  very  easy  to  show  that  the  P  operation  is  closed  with  respect  to  the  domain  [0,  .V-l],  i.e.. 
P(X )  €  [0,  N-l]  for  all  X  e  [0,  N-1].  Similarly,  we  can  define  the  inverse  function  P“'  of 
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p. 

DEFINITION  2;  Let  9“'  be  the  inverse  permutation  function  of  0.  The  inverse  CP  per¬ 
mutation  of  P  defined  in  Definition  1  is  specified  by  an  n -tuple  vector  = 
(0"kn-l),  .  .  .  ,0~kl).9“kO)).  The  mapping  of  P"^  on  X  =  e  [0.  N -\]  is 

obtained  as  follows: 

P-^  €  CP(n )  (  N  =  2" ) 

P"'(-rn-, . -ti^o)  =  (P^Hx„_i),  .  .  .  ,P-'U_i),P-k-ro)) 

=  (2„_i.  .  .  .  ,  Z  l.^o)-  ^ 

For  example,  consider  n  =  4  and  let  V  =  (2, 1,0,3)  and  X  =  (1, 0,0,1).  We  have  ^3  =  x  2, 
_V2  =  Xi,  >'1  =  .ro  and  vg  =  x 2.  Hence,  P(X)  =  (0,0, 1,1).  Similarly,  we  have  =  (0,3,2, 1) 
and  P-‘(X)  =  (1, 1.0,0). 

Note  that  the  -subshuffle  ,  the  perfect  shuffle  a  (i.e.,  the  (n-l)-subshuffle)  ,  the 
butterfly  (3^  and  the  bit  reversal  p  are  defined  as  follows: 

a(X)  =  (j:„_2,-.^^i,Xo^„-i) 

p(X)  =  Cro-^l’--^n-2^n-l) 

These  functions  are  examples  of  general  shuffle  permutations  and  are  used  to  define  the  basic 
connection  patterns  to  design  the  six  MINs  studied  by  Wu  and  Feng  [WuFeSO].  It  is  particu¬ 
larly  wonhwhile  to  discuss  a  special  subset  of  permutations  in  CP(n )  with  0(0)  =  0,  because 
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these  permutations  give  remarkable  restrictions  on  the  design  of  useful  « -stage  MINs  as  we 
shall  see  later. 

As  the  connection  patterns  defined  by  -symbol  permutations  are  considered,  the  switch- 

iV 

ing  elements  on  stages  at  both  sides  of  a  connection  pattern  can  be  labeled  from  0  to  —  - 1  = 

2''~^-l  following  the  same  ordering  as  the  labeling  of  their  output  (input)  links.  That  is,  let  V 
=  ■  .  .  ,^i)  be  the  label  of  a  switching  element,  then  the  output  (input)  links  connected 

to  this  switching  element  are  labeled 

A’o  =  - .r,,0) 

=  (.r„_i,  .  .  .  ,.ti,l). 

These  Xq  and  Xj  are  named  as  the  0-output  (0-input)  link  and  1-output  (1-input)  link  with 
respect  to  F,  respectively.  For  a  connection  pattern  defined  by  a  CP  permutation  P  connect¬ 
ing  two  adjacent  switching  stages,  say  stage  j  and  stage  j  -t-1  (excluding  the  special  case 
specified  by  0(0)  =  0),  the  0-output  link  (j:„_i,  .  .  .  ,JCi,0)  of  a  switching  element  Y  = 

(.r^_i . .rj)  at  stage  j  is  connected  to  the  input  link  (P(x^_i),  .  .  .  ,  P(jc  i),P(a:o))  of  a 

switching  element  (P(a:„_i),...,  P(j:i))  at  stage  y+1  such  that  some  P(a:,)  =  0,  l^'^-l.  Simi¬ 
larly,  the  1-ourput  link  (x„_i,  .  .  .  ,ati,l)  is  connected  to  the  input  link  (P(x„_i),...,P(.r i),P(.ro)) 
of  a  switching  element  (P(x„_i),...,P(a:  j))  such  that  P(x,  )  =  1.  These  two  switching  elements 
which  connect  to  the  0-output  link  and  1-output  link  of  F  are  called  F’s  0-successor  and  1- 
successor.  respectively.  We  denote  them  as  succ^{Y)  and  succHY).  In  a  similar  way,  for  a 
switching  element  Z  =  •  ■  ■  ,^0  at  stage  y>l,  prec^(Z)  and  prec^iZ)  at  stage  j  w'hich 

connect  the  0-input  link  and  1 -input  link  of  Z  represent  Z’s  0-predecessor  and  1 -predecessor, 
respectively.  As  those  connection  patterns  defined  by  CP  permutations  with  0(0)  =  0  are 
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considered,  we  have  succ'^iY)  =  5MCc^(y)  and  prec^iZ)  =  prec^iZ).  Therefore,  for  a  connec- 
don  pattern  defined  by  a  P  e  CP(n)  connecting  stage  j  and  stage  7+I,  we  can  always  have 

N 

the  following  relation:  let  Y  =  T  e  [0,  —  -1],  be  a  switching  element  at  stage  J. 

If  P  is  specified  by  9(0)  ^  0,  then  succ^iY)  -  succ^{Y)  =  2‘  for  some  i  €  [1,  n-l];  else 

-  succ^iY)  =  0.  Similarly,  let  Z  =  (v„_i,  .  .  .  ,vi),  Z  e  [0*  ^  ^  switching 

element  at  stage  j+\.  If  P  is  specified  by  0(O)9tO,  then  prec^{Z)  -  prec^iZ)  =  2*  for  some  k 
€  [1,  A?  - 1];  else  prec  ^(Z)  -  prec^(Z)  =  0. 

Our  CP(/z )  connection  patterns  satisfy  the  definition  of  independent  connections 
[BeFo88].  Thus,  they  are  independent  connections,  too.  For  example,  in  Fig.  2.1,  we  show  a 
connection  pattern  defined  by  a  CP(4)  permutation  on  numbers  from  0  to  15.  The  permuta¬ 
tion  is  specified  by  =  (0,1, 2, 3)  which  is  a  bit  reversal  function  p.  In  Fig.  2.2,  we  show 
another  connection  pattern  specified  by  the  vector  V  =  (1,3. 2,0)  in  which  9(0)  =  0. 

2.3.  TOPOLOGICAL  STRUCTURE  OF  THE  log2N  CP""*" 

A  MIN  is  said  to  have  the  Banyan  Properry  if  and  only  if  for  any  input  and  output  there 
exists  a  unique  path  connecting  them,  i.e.,  its  inputs  and  outputs  are  fully  connected.  Any 
Banyan  type  MIN  can  easily  be  modeled  by  a  Banyan  graph  in  which  venices  represent 
switching  elements  and  arcs  represent  connection  links  [Agr83].  The  structure  of  a  Banyan 
graph  is  essentially  an  overlay  of  tree  structures  and  assures  full  connectivity  among  base  and 
apex  vertices  without  redundancy.  In  particular,  those  iVxiV  n -stage  Banyan  type  MINs 
[FenSl]  constructed  with  2x2  switching  elements  can  be  modeled  by  (2,2.n)  rectangular  SW 
Banyan  graphs  and  its  corresponding  Banyan  graph  representation.  As  mentioned  in  some 


0  0  0  0 
0  0  0  1 
0  0  10 
0  0  11 
0  10  0 
0  10  1 
0  110 
0  111 
10  0  0 
10  0  1 
10  10 
10  11 
110  0 
110  1 
1110 
1111 


0  0  0  0 
10  0  0 
0  10  0 
110  0 
0  0  10 
10  10 
0  110 
1110 
0  0  0  1 
tool 
0  10  1 
110! 
0  0  11 
10  11 
0  111 
1111 


2 

3 

4 

5 

6 

7 

8 

9 

10 
11 

;  2 

13 

14 

15 


6 


s 

9 

10 
11 

i: 

13 

14 

15 


Fig.  2.1.  The  connecuon  pattern  defined  by  a  CP(41  permutauon 
V  =  (0, 1,2.3),  i.e.,  a  bit  reversal  funcuon  q  . 
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Fig.  2.2.  The  connection  pattern  defined  by 
V  =  (1,3, 2,0)  in  which  0(0)  =  0  • 
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previous  works  [WuFe80][Agr83]PHuTr86][BeFo88],  all  the  networks  with  graph  representa¬ 
tions  isomorphic  to  the  same  Banyan  graph  are  topologically  equivalent. 

It  is  a  direct  conclusion  from  Section  2.2  and  [BeFo88]  that  if  there  exist  n  -stage  Banyan 
type  MINs  which  can  be  defined  by  (n -t-l)-level  CP(n)  permutations,  they  must  be  topologi¬ 
cally  equivalent  to  one  another  and  their  graph  representations  are  isomorphic  to  that  of  a 
baseline  network.  What  we  are  concerned  with  in  this  section  is  how  to  construe’’  a  class  of 
Banyan  type  MINs  (i.e.,  the  class  of  CP™"  in  our  notation)  using  connection  patterns  defined 
by  CP  type  permutations.  In  order  to  give  a  more  intuitive  explanation  of  their  functional 
properties,  a  path-descriptive  methodology  is  adopted  hereafter.  Our  point  of  view  is  that  for 
easily  routing  the  message  from  a  source  input  to  a  destination  output  in  a  MIN,  the  routing 
scheme  should  preserve  the  information  of  source  and  destination  addresses,  and,  what  is 
more  significant,  indicate  the  topological  structure  of  this  MEN. 

Consider  an  N '<N  n -stage  MIN,  F  (see  Fig.  2.4),  consisting  of  n  switching  stages  and 
n-(-l  connection  patterns  defined  by  CP(n)  permutations.  F  can  be  defined  as 

p  _  p/J  .  .  .  gOpO 


where  P‘  e  CP(n),  E'  =  E  (for  all  i)  denote  switching  stages,  and  the  superscript  i  specifies 
the  i  th  stage.  The  effect  of  a  switching  stage  E  is  an  exchange  permutation  which  is  obtained 
as  follows; 

For  .Y  =  (.r^_i,  .  .  .  ,.ri,n:o).  ^  ^  [O’ 

E(Y)  =  . x^,d) 

where  d  =  or  x^y  (the  complement  of  .x, ). 
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Fig.  2.3.  The  Baseline  network  and  its  Banyan  graph  repreS' 
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stage  0  stage  (n-l) 


Fig.  2.4.  An  ^  ^  n-stage  MIX- f  .consisung  ot  n  switching  stages  and 
(n+l)-level  connection  patterns  delmed  by  CP(n)  permutauons. 
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There  are  two  kinds  of  operations  on  F.  A  F  permutes  binary  bits  of  the  operand  according 
to  its  corresponding  vector  V‘  =  (0‘(/i-l),  .  .  .  ,0‘(1),0‘(O))  and  an  E‘  replaces  the  least 
significant  bit  (LSB)  of  the  operand  either  by  the  original  bit  or  its  complement.  Generally 
speaking,  the  result  of  an  operand  X  performed  by  F  and  E‘  consecutively  can  be  expressed 
as  follows; 

X  =  U„_i . xi^o) 

X''  =FCY)  = 

=  (y«-i.  •  •  • 

Here,  we  compose  functions  from  right  to  left  so  that  for  operations  F  and  E'  over  X  e  [0, 
A-l],  E‘F  is  defined  as  E‘(P‘  (X)). 

For  r  to  be  a  Banyan  type  MIN,  there  exists  a  unique  path  from  any  input  5  = 

.  .  .  ,5i,5o)  to  an  arbitrary  output  D  =  .  .  .  ,di,do)-  Conceptually,  we  can  ima¬ 

gine  that  5  is  propagated  through  F  to  D  performed  by  those  2n  +  l  operations  consecutively. 
Hence,  we  can  get  a  unique  valid  transition  sequence  consisting  of  2n-t-l  binary  numbers,  i.e., 

D®  =  pO(S) 

£)(0)(0)  ^  E®(£)0) 

£,1  ^  pl(£)00) 

QdXi)  ^  E'fDh 
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£)(n-l)(n-l)  _  g(n-l)^£)(rt-l)^ 

and  eventually, 

=  D. 

In  their  mathematical  meaning,  D‘  and  are  mappings  of  operands  £)(‘-i )(<-!) 

performed  by  P*  and  E' ,  respectively.  From  another  point  of  view,  according  to  the  definition 
of  basic  connection  patterns,  we  can  say  that  each  D'  is  the  address  of  the  input  link  through 
which  the  path  from  5  to  D  traverses  at  stage  i.  Similarly,  each  is  the  address  of  the 

output  link  traversed  stage  i .  More  precisely,  the  transition  sequence  has  the  following  form: 


s  =  (Sn-l.  ■ 

■  •  .•yi.'S'o) 

D"  =  (^,"-1 ,  . 

D '  =  (s,li ,  . 

■  ,Si  4^) 

-n-l  o'*”!  ^ 

.  .  ,  J  J  yJ  Q  ) 

.  . 

•  ■  •  ’diMo) 

=  D. 
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Instead  of  being  a  description  using  a  graph  model,  the  above  is  a  path-descriptive  outline  of 
the  structure  of  an  -stage  Banyan  type  MIN  defined  by  CP(n )  permutations.  The  following 
theorem  is  a  necessary  and  sufficient  condition  for  the  class  of  n -stage  MINs  defined  by 
CP(rt )  permutations  to  satisfy  the  Banyan  propeny. 

THEOREM  1:  Let  F  be  an  n -stage  MEN  defined  by  CP(/j)  permutations.  F  is  an  n- 
stage  Banyan  type  MIN  if  and  only  if  there  exists  an  O  e  CP(n)  specified  by  a  vector  = 
(e^(«-l),  .  .  .  ,0^(1), 0^(0))  such  that 

—  c  ^  c  0  \ 

-  0  ,  ,Sq  ,Sq  ) 

is  true  for  each  transition  sequence  representing  the  path  from  any  input  S  = 
.  .  .  ,5i^o)  output  D  =  (d„_j,...,di,<io).  Here,  5o  is  the  LSB  of  D‘  in  a  transi¬ 

tion  sequence. 

Proof:  (only  if)  Let  e  [0,  n-1].  We  assume  F  is  a  Banyan  type  MIN.  There 

exists  a  unique  path  between  any  input  and  output  of  F.  First,  each  P'  in  F  is  a  fixed  permu¬ 
tation  pattern  which  can  only  permute  bits  of  its  operand.  Second,  in  each  transition  sequence 
all  possible  chances  to  change  the  value  of  a  bit  exist  on  those  n  exchange  permutations 
corresponding  to  n  switching  stages  where  LSB’s  of  their  operands  can  be  changed.  Thus,  as 
in  any  transition  sequence  each  5,  must  be  given  exactly  one  chance  to  be  changed  to  some 
desired  dj.  This  is  true,  because  if  any  5,  gets  more  than  one  chance  to  be  changed,  then  at 
least  one  ^  i ,  has  no  chance  to  be  changed,  and  each  input  S  could  reach  no  more  than 
Z'*"’  of  the  total  2"  outputs.  This  contradicts  our  assumption.  Moreover,  for  each  input  S 
two  different  transition  sequences  cannot  reach  the  same  output  D .  They  are  different  in  the 
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LSB  of  at  least  one  if  and  only  if  the  outputs  they  reach  are  different.  Therefore,  each 

5,  is  allowed  to  appear  exactly  once  at  the  position  of  the  LSB  in  some  D*.  That  is, 
for  some  i  ,k  e  [0,  az-I]  and  there  is  a  one-to-one  mapping  between  the  subscript  i  and  super¬ 
script  k .  Obviously,  for  T  to  be  a  Banyan  type  MIN,  it  is  the  responsibility  of  each  where 
k  e  [0,  /i-l]  to  bring  some  unique  in  S  to  the  position  of  the  LSB  exactly  once.  Here,  s, 
gets  the  only  chance  to  be  changed  to  a  desired  dj  after  the  operation  of  a  switching  stage. 
However,  P"  can  be  an  arbitrary  permutation  in  CP(n ).  Thus,  the  existence  of  such  a  permu¬ 
tation  function  O  €  CP{n )  such  that 

.  .  .  ,Si,Sq)  =  (5o~‘  .  •  ■  ■  .-^0  -^0°) 

is  clear.  Let  function  O  be  specified  by  a  vector  =  (9^(n-l),  .  .  .  ,  9^(1),9^(0)).  Then 

represents  the  order  of  bits  of  S  to  be  permuted  to  the  position  of  LSB  in  a  transition 
sequence. 

(if)  Since  there  exists  a  function  O  e  CP(n)  such  that 

O  =  .  .  ■  ,Si^q)  =  (5()  ' . 5()  .Jg  ) 

is  true  for  all  the  transition  .sequences  on  F,  each  bit  of  an  arbitrary  5  gets  only  one  chance  to 
appear  in  the  position  of  LSB  where  it  can  be  changed  to  a  desired  value  after  the  operation 
of  a  switching  stage.  That  is  to  say,  any  input  S  has  a  unique  transition  sequence  or  path  to 
reach  any  one  of  the  2'*  outputs.  Therefore,  there  exists  a  unique  path  between  any  input  and 
any  output  on  F.  □ 

In  Theorem  1,  we  outlined  one  characteristic  on  the  topological  structure  of  the  CP”'“’  as 
the  necessary  and  sufficient  condition  for  each  MIN  in  the  class  to  satisfy  the  Banyan  pro- 
peny.  For  a  complete  analysis  of  the  CP"'‘",  there  is  still  another  thing  which  should  be  noted 
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on  the  transition  sequence.  After  each  is  permuted  to  the  position  of  the  LSB  in  some  D^, 
it  must  be  replaced  by  a  desired  dj  as  the  result  of  i.e.,  dj  =  d*.  Then,  in  the 

dQ  is  permuted  to  the  position  of  bit  /  where  /  ^  0  (i.e.,  d^  =  because  another 

(m  ^  i)  should  appear  in  the  position  of  LSB  as  required  by  the  necessary  condition  that  F 
needs  to  be  a  Banyan  type  MIN.  As  a  result,  we  have  the  following  relation  in  each 

{d‘ . d^}  c  . },  for  all  i  e  [0,  n-2] 

and 

Eventually,  after  the  mapping  of  P",  all  the  d^  where  k  e  [0,  n-l]  will  be  arranged  in  the 
correct  position  corresponding  to  D  =  .  .  .  ,di,dQ)  such  that  =  d*.  Clearly,  there 

exists  another  permutation  function  I  e  CP(rt)  which  is  related  to  the  order  in  which  bits  of 
D  are  to  be  replaced.  Given  a  F  €  CP™",  the  function  I  is  an  inherent  characteristic  in  addi¬ 
tion  to  the  function  O.  We  can  use  I  and  O  to  uniquely  describe  the  topological  structure  of 
a  CP"'*"  t>p)e  MIN.  We  call  I  and  O  characteristic  functions. 

DEFINITION  3:  Let  F  be  an  « -stage  CP^"  type  MIN.  F  can  be  characterized  by  the 
two  CP  permutation  functions  O  and  I,  named  as  characteristic  functions.  The  function  O  has 
the  same  definition  as  described  in  Theorem  1.  The  function  I  e  CP(n)  is  specified  by  a  vec¬ 
tor  =  (9^(n-l) . 0^(1),0^(O))  such  that 

(d^_, . d,,do)  =  Ifd''-^  .  .  .  ,dKd°) 

=  _ ,d^^^\e^m. 

Here.  d‘  is  the  LSB  of  D“  in  a  transition  sequence.  G 
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Now,  we  can  conclude  the  above  discussion  as  follows.  As  long  as  all  the  n+l  selected 
connection  patterns  used  to  construct  a  CP^  t>'pe  MIN  F  satisfy  Theorena  1,  the  topological 
structure  of  F  can  be  specified  by  two  characteristic  functions  O  and  I  as  defined  in  Definition 
3.  In  other  words.  Theorem  1  and  Definition  3  imply  that  any  transition  sequence  of  a  CP*™" 
type  MIN  has  the  following  form: 

S  .  .  .  ,.y !,.?()) 

D®  =  ,  .  .  .  ^e<’(0)) 

D  ^  =  (Sn-\  ,  .S  I  ,.S0<?(1)) 


n  _  /(.n-l  ^n-l  f  ^  \ 

^  . 

0(n-l)(n-n  ^  (^n-l . 5^-1 

D”  =  . 5?,5'g) 

=  (d„_i,  .  .  . 

where  [d\  .  .  .  ,d^}  c  {.Sn-1  -  •  •  •  }•  for  all  i  g  [0,  /i-2],  and  d^  e 

. for  all  j  G  [0,  n-11. 

Thus,  the  values  of  9^(1)  and  9^(t)  of  a  CP"^  type  MIN  can  be  easily  obtained  from  its  tran¬ 
sition  sequence.  That  is  to  say,  {6^(n-l) . 9^{1).9^(0))  is  a  permutation  on  the  sub¬ 

scripts  of  .s,  ’s  and  (9^(n-l),  .  .  .  .9^(1),Q^(0))  is  a  permutation  on  the  superscripts  of  dg’s. 


24 


In  Fig.  2.5,  we  depict  a  16x16  4-stage  CP  type  MIN  F  =  •  •  •  E®P®,  where  = 

(2, 1,0,3),  =  (2,3,0, 1),  =  (0.1,3, 2).  =  (2,0,3,!)  and  =  (1,3,0,2).  The  general  form 

of  transition  sequences  on  F  is  as  follows: 

S  =  (S2,  S2,  S^,  Sq)  =  (^2,  ^1,  Sq,  Sj) 

=  (^2,  ^  1,  ^0-  ^2)  =  (ij,  52,  d2,  Sq) 

=  (5 j,  5  2,  ^2,  ^^3)  =  (d 2,  d2,  s i,  s 2) 

D"'"  =  (dj,  ^2,  5  j,  do)  =  (^2,  dQ,  dQ,  s  j) 

=  (d2,dQ,d2,dO  D^  =  (d2,d2,d^,dQ)=D. 

We  have  0(53,52,^ i,^o)  =  (^ i,^ 2’-^0’-^3)  (^^3,^^ 2,^  1,^0)  =  l(d^M^,d^,d^  =  {d\d^J^,d^). 

i.e.,  =  (1,2,0, 3)  and  =  (1,0,3,2). 

As  we  pointed  out  above,  the  combined  effect  of  permuting  bits  of  the  operation 
sequence  P"  •  ■  •  P°  on  a  CP”^  type  MIN  F  can  be  reflected  by  two  permutation  functions  O 
and  I.  The  relationship  between  O,  I,  and  the  operation  sequence  p"  ■  •  •  p®  can  be  described 
by  the  following  lemma. 

LEMMA  1:  On  a  CP^  type  MIN  F,  the  characteristic  function  O  is  uniquely  deter¬ 
mined  by  the  sequence  P""!  •  •  •  pO  and  characteristic  function  I  is  uniquely  determined  by 
the  sequence  P"  •  •  •  P^ 

Proof;  Consider  an  arbitrary  transition  sequence  of  F.  After  the  operation  of  P‘,  i  e 
[0.  n-l],  bit  of  input  5  is  permuted  to  the  position  of  LSB  in  D‘.  Clearly,  P"  has 

nothing  to  do  with  the  function  O.  Similarly,  in  each  is  replaced  by  bit  d‘. 

Each  d‘  is  then  permuted  by  P'"^’  and  preserved  in  Thus,  the  final  order  of  d‘ ’s  in  D" 
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Fig.  2.5.  A  16  X  16  4-siage  CP™'"  type  MIN  are  constructed  with  connecuon 

patterns  specified  bv  vectorsv«  ■=  f2. 1,0,31,  v'  =  (2.3,0, 1),  v-*  =  (0,1,3,21 

*'  “  i 2.0,3, il  and  -  (1,3,0..»1.  The  conflicting  path  connections  from 

0  =  (0,0,0,11  to  £)  =  (1. 1.0.0)  ana  ,4  =  (1.0, 0.0)  to  5  =  (1,1.1.11  is  also 
shown. 
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=  which  can  be  specified  by  the  function  I  has  nothing  to  do  with  Pfc] 

Now,  we  would  like  to  investigate  the  topological  characteristics  of  the  reversal  network 
of  a  0?^“^  t\’pe  MIN.  The  reversal  network  of  a  type  MIN  F  is  the  network 

obtained  from  F  by  reversing  the  direction  of  each  connection  pattern,  and  replacing  each 
input  (output)  by  an  output  (input)  without  changing  its  label.  Let  denote  the  inverse 
function  of  a  permutation  function  P*  e  CP(n).  We  have  the  following  lemma  on  the  topo¬ 
logical  structure  of  a  reversal  network. 

LEM-MA  2:  The  reversal  network  of  a  CP^  type  MIN  F  defined  as  a  permutation 
sequence  •  ■  E^^P^  is  a  CP"^  type  MIN: 

=  p-(0)£(0)p-';i)  .  .  .  £(n--l)p^(n-l) 

Moreover,  F^  has  two  .r  acteristics  functions  and  where  is  uniquely  determined 
by  .  .  .  p  <'•>  fR  (5  uniquely  determined  by  P"^0)p»-(i)  .  .  .  p-<n-i) 

Proof.  Since  the  reversal  connection  pattern  of  each  connection  pattern  defined  by  P 
can  be  defined  by  the  inverse  function  and  the  reversal  switching  stage  of  a  switching 
stage  is  still  the  same  switching  stage,  it  is  clear  that  the  reversal  network  of  F  can  be 
expressed  as 

p/?  =  p-(0)£(0)p-(  1)  .  .  .  £(n~l)j>-(n) 

Besides,  because  of  the  isomorphism  between  graph  models  of  F  and  F^ ,  F^  satisfies  the 
Banyan  property,  i.e.,  F^  belongs  to  n -stage  CP^.  Thus,  by  a  similar  proof  as  that  for 
Lemma  1,  we  can  show  that  there  exist  two  charactenstic  functions  0“^  and  such  that  O'^ 
is  uniquely  determined  by  the  sequence  and  is  uniquely  determined  by 

p»-(0)pr-tn  .  .  ,  j>-(n-I  ) 
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□ 

In  Table  2.1,  the  characteristic  functions  of  several  famous  MINs  and  their  reversal  net¬ 
works  are  summarized. 

Three  conclusions  can  be  made  for  this  section.  First,  it  is  very  easy  to  observe  that, 
except  for  P®  and  P",  no  other  P^’s  (Jc  e  [1,  «-l])  can  be  CP(n)  permutation  functions 
specified  by  0(0)  =  0.  This  is  because  any  P^  specified  by  6*(0)  =  0  in  V*  will  cause  some 
bit  s‘  of  input  5  to  lose  the  chance  to  appear  at  the  position  of  LSB  in  the  transition 
sequence,  and  the  n  -stage  MIN  F  will  no  longer  preserve  the  Banyan  property.  This  is  a  for¬ 
bidden  case  described  in  Theorem  1.  Second,  as  the  equivalence  problem  is  considered  on  the 
topologically  equivale'nt  class  of  CP^,  some  subdivisions  can  be  made.  It  is  very  easy  to 
verify  that  there  exists  a  one-to-many  mapping  between  functions  O  or  I,  and  MINs  in  the 
Cpnun  equivalent  relationship  between  arbitrary  MINs  in  the  CP^  can  be  classified  as 
follows:  catalogs: 

The  class  of  0-equivalent  MINs:  with  the  same  function  O  but  different  function  I. 

The  class  of  I-equivalent  MINs:  with  the  same  function  I  but  different  function  O. 

The  class  of  0/I-equivalent  MINs:  with  the  same  function  I  and  function  O. 

Therefore,  the  whole  class  of  CP™"  can  be  partitioned  into  {n  !)^  0/I-equivalent  classes.  Each 
class  with  specified  functions  O  and  I  has  [(n-l)!]”  different  drawings.  For  example,  it  can 
be  shown  that  a  4-stage  CP"™  type  MIN  with  connection  patterns  specifiea  by  F®  =  (1,2, 0,3), 
1/“  =  (2,3.0, 1),  =  (2,0, 1,3),  =  (1, 0,2,3)  and  F**  =  (1, 3,0,2)  is  O/I  equivalent  to  the  MIN 

depicted  in  Fig.  2.5.  Third,  the  equivalence  relation  in  the  CP™"  can  easily  be  described  by 
linear  transformations  between  two  arbitrary  MINs.  Assume  that  two  CP'"'"  type  MIN,  F j 
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Table  2.1  The  characteristic  funcuons  of  several  famous  MINs  and  their  reversal  networks 


MIN’S 

I/O 

V' 

Delta  network 

(1.2 . 71-1.0) 

(0 . 71-2. 7i-n 

reversal  Delta  network 

(71-1 . 1.0) 

(0.71-1 . 2.1) 

Omega  network 

(C . 71-2.71-1) 

(0 . 71-2. n-D 

reversal  Omega  network 

(71-1..... 1.0) 

(71-1 . 1.01 

Baseline  network 

(71-1 . 1.0) 

(0 . 71-2.71-1) 

reversal  Baseline  network 

(7t-l . 1.0) 

(0 . 71-2.71-1) 

Indirect  Binarv’  Cube 
network 

(71-1 . I.O) 

(71-1.. ...2.0) 

reversal  Indirect  Binary 
Cube  network 


(0 . n~2,n~lj 


(0 . n-2,n-l) 
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and  Tj,  are  specified  by  functions  Oi,  Ij  and  O2,  I2.  respectively.  If  we  transform  F]  to  r2, 
we  can  always  have  the  follow! -  g  relation:  there  exist  two  functions  F,  R  e  CP{n )  such  that 

O2F  =  Oi  and  RI2  =  Ii, 
or 

F  =  02"'0i  and  R  = 

Functions  F  and  R  represent  two  linear  transformations  or  two  fixed  connection  patterns 
added  before  the  first  and  after  the  last  connection  pattern  of  F 2.  They  can  also  be  interpreted 
as  the  renaming  scheme  on  the  inputs  and  outputs  of  F2,  i.e..  the  inputs  are  renamed  accord¬ 
ing  to  function  F^’  and  the  outputs,  function  R“^  The  renaming  scheme  can  easily  transform 
one  network  to  another  network  without  any  hardware  cost.  It  has  a  significant  impact  on 
designing  reconfigurable  systems. 

2.4.  ROUTING  ALGORITHM 

In  a  general  MIN.  routing  is  established  by  attaching  to  each  input  a  path  control 
sequence  or  a  path  descriptor  [KrSn86]  to  lead  it  to  a  desired  output.  Generally  speaking, 
paths  from  different  inputs  to  the  same  output  may  have  different  control  sequences.  Thus,  a 
routing  table,  containing  a  path  control  sequence  for  each  output,  is  needed  at  each  input. 
From  the  viewpoint  of  simple  routing,  it  is  convenient  to  have  all  these  tables  identical. 

As  discussed  in  previous  sections,  any  path  in  a  type  MIN  leading  from  an  input 

to  an  output  can  be  represented  by  a  transition  sequence.  Note  that  each  D'  and  in  a 

transition  sequence  represent  the  address  of  input  and  output  links  through  which  a  path 
traverses  stage  i.  Moreover,  the  ordered  set  of  all  the  LSB’s  in  D‘’s  ,  •  ..V(i..Vo  ) 


30 


which  equals  (•S'9‘’(„_i).-..-y 90(1)^590(0))  is  a  permutation  on  the  bits  of  the  address  of  an  input. 
Similarly,  the  binary  representation  of  an  output  (d„_i,...,di,do)  which  equals 
(d®^(”~i),...,d®^(i),d®^^0))  is  a  mapping  of  a  permutation  on  the  ordered  set  of  all  the  LSB’s  in 
£)(0(<)>s  .  .  .  ,d^,d^).  Therefore,  three  conclusions  can  be  made  on  a  CP^  type  MIN. 

First,  any  path  connecting  an  input  and  an  output  preserves  the  information  of  input  and  out¬ 
put  addresses,  and  indicates  the  topological  structure  of  this  MIN  reflected  by  functions  O  and 
I.  Second,  the  LSB’s  in  D‘’s  and  represent  labels  of  input  and  output  links  in  a 

switching  element  (i.e.,  0-input  (0-output)  or  I-input  (1-output))  through  which  a  path 
traverses  stage  i.  Hence,  at  each  stage,  regardless  of  the  input  link  through  which  a  path 
traverses,  this  path  can  always  be  routed  to  the  desired  output  link  (i.e.,  50O(,)  is  replaced  by 
which  should  be  a  binary  bit  of  D ).  It  is  natural  that  each  path  control  sequence  can  be 
constructed  by  using  only  the  address  of  a  destination  output.  In  this  chapter,  this  is  referred 
to  as  destination-oriented.  Third,  since  the  content  of  a  path  control  sequence  is  only  related 
to  the  destination  output  to  which  a  source  input  desires  to  route  and  the  function  I,  all  the 
routing  tables  arc  identical.  This  is  required  by  a  simple  routing  scheme. 

Thus,  briefly  speaking,  the  distributed  routing  on  an  n  -stage  CP^  type  MEN  F  which  is 
characterized  by  O  and  I  is  accomplished  by  the  source  input  attached  with  the  path  control 
sequence  T  =  (r„_i,  .  .  .  Jido)  as  routing  tags  along  with  a  request  for  connection.  As  the 
request  progresses  through  the  stages  of  F,  the  switching  '‘ment  at  stage  i  uses  the  tag  t, 
from  the  path  control  sequence  T  to  route  the  incoming  request  via  the  particular  output  link 
determined  by  f.,  i.e.,  via  0-output  link  if  t,  =  0  and  1 -output  link  if  r,  =1.  Eventually,  the 
request  reaches  the  correct  destination.  In  other  words,  the  distributed  routing  can  be  accom¬ 
plished  under  local  control  at  each  switching  element.  Any  switching  element  is  said  to  be  in 
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the  1 -state  if  a  crossing  connection  (from  0-input  to  1 -output  or  from  1 -input  to  0-output)  has 
been  established  and  in  the  0-state  if  a  straight  connection  (from  0-input  to  0-output  or  from 
1 -input  to  1 -output  has  been  established.  Let  /,  be  the  label  of  the  input  from  which  an 
incoming  request  comes  and  4  be  the  label  of  the  output  link  to  which  the  routing  tag  deter¬ 
mines  to  route.  Obviously,  the  state  of  a  switching  element  can  be  obtained  by  performing  an 
Exclusive-OR  on  /,  and  /^.  Next,  we  study  the  general  form  of  the  routing  algorithm  for  the 
log2(V  CP"^.  Let  XOR  be  the  modulo  2  addition  and  5  —>0  denote  the  path  connection  from 
input  S  =  (5„_i,  .  .  .  ,i'i,.So)  to  output  D  =  (d„_i.  .  .  .  ,di,dQ). 

THEOREM  2:  Let  T  =  (t„_i,  .  .  .  ,ti,ro)  be  the  path  control  sequence  of  S-^D  on  an 
n  -stage  CP^  type  MIN  F.  Then  r,-  has  the  following  form: 

<,  =r'M). 

Moreover,  g;  =  Of5j  )  XOR  can  determine  the  state  of  the  switching  element  through 

which  5— >D  traverses  at  stage  i.  G  =  (g„_i,  .  .  .  ,gi,go)  is  called  the  path  state  sequence. 

Proof:  As  we  remarked  above  that  the  path  5— can  always  be  routed  via  Ois^)  or 
the  50O(,)-input  link  of  some  switching  element  at  stage  /,  the  selection  of  a  correct  output 
link  at  stage  i  (i.e.,  the  replacement  operation  performed  on  to  0(5,  ))  is  clearly  irrelevant  to 
the  incoming  0(5,  )-input  link.  Therefore,  as  long  as  T  is  used  as  the  path  control  sequence, 
it  is  equivalent  to  saying  that  S—^D  traverses  some  switching  element  at  stage  i  from  an 
0(5,  )-input  link  to  an  output  link  whose  label  is  I"^(di),  i.e.,  Thus,  we  have  the 

following  transition  sequence: 

S  —  (5„ _j  ,...,5  I  ,5q) 


-  {Sn-\  ,  .  .  .  ,5  ?  »J0O(O)) 
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D"  =  . 

~  ^^e'(d'''(n-i)y  •  •  •  ’'^0'(e''‘(i))’^0'(e''‘(O))^ 

=  (d^_i,...,di,dQ). 

Obviously,  it  is  valid  and  I“kd,  )  is  the  only  possible  routing  tag  which  can  be  used  at  stage  i. 

However,  the  state  of  the  switching  element  at  stage  /  is  determined  by  0(s,  )  XOR  □ 

For  example,  the  path  control  sequences  and  path  state  sequences  of  the  famous  MINs  in 
Table  2.1  are  summarized  in  Table  2.2.  For  the  MEN  in  Fig.  2.5,  we  have  = 

{di4Q,d^42)-  The  path  connection  from  S  =  (0,0,0,1)  to  D  =  (1, 1,0,0)  is  shown  by  a  bold 

line,  where  from  Theorem  2  we  can  get  T  =  (0,0, 1,1)  and  G  =  (0,0,0, 1). 

Theorem  2  provides  an  efficient  routing  scheme  on  a  packet-switching  CP^  type  MEN. 
In  particular,  in  packet-switching  networks,  when  messages  are  sent  from  inputs  to  outputs, 
replies  are  returned  to  the  sender.  Thus,  the  address  of  the  sender  is  needed.  Instead  of 
attaching  the  sender  address  to  a  message,  the  address  can  be  created  while  passing  the  MIN: 
whenever  bit  I“Vd,  )  in  the  path  control  sequence  is  discarded  at  stage  i,  it  is  replaced  by  bit 
0(.y, )  that  identifies  the  input  link  from  which  the  message  came  from.  Evenmally,  this  mes¬ 
sage  preserves  the  address  of  the  sender  as  it  arrives  at  the  receiver.  We  should  note  that  not 
every  Banyan  type  MEN  has  this  natural  property.  For  example,  in  Fig.  2.6,  two  3-stage  non- 
CPT™  Banyan  type  MINs  are  shown.  The  MIN  in  Fig.  2.6(a)  is  topologically  equivalent  to 
Cpmin^  but  the  other  one  in  Fig.  2.6(b)  is  not.  Even  if  their  routing  schemes  are  also 
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Table  2.2.  The  path  control  sequences  and  path  state  sequences  of  those  .MINs  in  Table  2.1. 


.VIlN’s 

r 

II 

-1.  . . ..  fi.  to) 

Delta  network 

(do.... 

.  dn-2.  d.i_i) 

(Si  ©  do.  22  ©  di, . .  . ,  2„_1  ©  d„.2,  io  ©  d^-i) 

reversal  Delta  network 

(do.  d.1 

-1. .  • . .  d^,  d\) 

(2«-l  ©  do,  2n-2  ©  d„-i . 5i  ©  d2,So  ©  d;  ) 

Omega  network 

(do.... 

.  d^-2,  d^-i) 

(jO  ©  do . Sn-2  ©  d„-2.  2.1-1  ©  di-i) 

reversal  Omega  network 

(d„-i. 

.  . ,  1 ,  ^o) 

(2,-1  ©  d,-i, . . . ,  2i  ©  di,  2o  ©  do) 

Baseline  network 

(do... 

.  d«-i) 

(2n-l  ©  do, ....  2i  ©  dn-2,  Sq  ©  dn-i) 

reversal  Baseline  network 

(do... 

.  d/i_2.  d/j-i) 

(2n-i  ©  do, ....  2]  ©  d„.2,  Sq  ©  d.,_i) 

Indirect  Binar>'  Cube 
network 

(d.-,. 

.  d  1 ,  do) 

(2„-i  ©  dn-i . 2i  ©  di,  2o  ©  do)  ' 

reversal  Indirect  Binary 
Cube  network 

(do. . . 

>  dn-2i  dfi-l) 

(20  ©  do,  ... ,  S„.2  ©  dn-2,  2n-i  ©  dfl-i) 
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(a) 


(b) 


Fig.  2.6.  Two  3-stage  non-CP'^  Banyan  type  MI.Ns  are  shown. 
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destination-oriented,  there  is  no  simple  rule  to  preserve  the  sender  address  on  them. 

In  a  CP^  type  MIN,  two  path  connections  which  do  not  result  in  connection  conflicts, 
(i.e.,  they  do  not  use  the  same  output  link  at  some  switching  element)  are  said  to  be 
conflict-free.  The  next  theorem  is  a  necessary  and  sufficient  condition  for  two  conflict-free 
path  connections  on  the  class  of  log2N  CP™”.  It  is  the  extension  of  the  theorem  in  [WuFeSl] 
which  only  deals  with  the  Baseline  network. 

DEFINITION  4:  Let  5  and  M  be  two  n  -bit  numbers.  (i)(S  A )  yields  ihe  maximum 
number  of  consecutively  identical  low-order  bits  of  S  and  A .  y(S  A )  yields  the  maximum 
number  of  consecutively  identical  high-order  bits  of  S  and  A  .  □ 

For  example,  if  5  =  (0,1, 1,0, 1,0)  and  A  =  (0,1, 0,0, 1,0),  we  have  (|)(5,A)  =  3  and  y(S,A) 

=  2. 

THEOREM  3:  In  an  « -stage  CP^*"  type  MIN  characterized  by  O  and  I,  two  path  con¬ 
nections  S  — >£>  and  A  ,  where  S  *  A  and  D  ^  B ,  aic  conflict-free  if  and  only  if 

v|i(  0(S),0(/i ) )  -t-  o(  rHo  ),rHB  ))<n. 

PROOF:  (only  if)  Suppose  y(  0(5),0(/I)  )  =  k  and  0(  r‘(£>),I~k5)  )  =  k  .  .According 
to  Theorem  4,  bit  0(5;)  on  the  transition  sequence  is  replaced  by  bit  after  S -^D 

traversing  stage  i.  Thus,  generally  speaking,  at  stage  i,  0  <  i  <  «-l,  the  output  links 
traversed  by  S-^D  and  A—^B  are  n(0(5„_i),  .  .  .  ,0(5;.^i),I“^(cf;),  .  .  .  ,I"'(do))  and 

n(0(a„_i),  .  .  .  .0(a;^i),r'(^7;),  .  .  .  .rH^o))  where  n  e  CP(n).  n(0(5,_i) . 

0(5,.,i),r'(d,) . r'(t/o))  and  n(0(a„_i),  .  .  .  ,0(a;+i),rV^?;) . r^i>o))  are  those 

D on  the  transition  sequences  of  S—>D  and  A—^B,  respectively.  By  the  definition  of  0. 
whenever  I  <  i  <k  ,  we  have  =  (I~'(^().  •  •  •  .I~'(^o))-  Since 
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vj;(0(S),0(/4 ))  =  k  and  k  +  k  <n,  we  must  have  k  <  n-k  <  n-i-\  and 
(0(5„_i),  .  .  .  ,0(5,+i))  56  (0(a„_i).  .  .  .  ,0(aj+,)).  Similarly,  if  k  <i  <  n,  then 
(rkd,),  .  .  .  ,rkdo))  ^  (rk^).  .  .  .  .r'^o))-  Consequently, 

n(0(5„_i), . . . ,  0(5,+i),r ■  •  •  ,i'kdo)) 

n(0(a„_i). . . .  ,0(ai.i).rk^i) . r^(fto)) 

at  each  stage  i,0  <  i  <  n-\.  Therefore,  5— >D  and  A-^B  are  conflict-free. 

(if)  Since  S  -^D  and  A  are  conflict-free,  we  have 

n(O(5„_0 . 0(5,-,i),rVd,) . rkdo))  ^ 

n(0(a„_i) _ • . .  .rkf>o)) 

at  each  stage  i,  0<i  ^  n-1.  This  implies  that  (0(s„_i),...,  0(5, >1))  * 
(0(a„_i).  .  .  .  .0(a,+i))  or  (rkd,).  .  ,  .  .rkc^o))  ^  •  •  •  XH^q),  at  each  stage  i.  As 

a  result,  v(  0(S),0(A  )  )  -1-  <)(  r^(D  ),r'(fl )  )  <  n.  IZ 

For  example,  in  Fig.  5,  if  5  =  (0,0,0, 1),  D  =  (1,1, 0,0),  A  =  (1, 0,0,0)  and  B  =  (1,1, 1,1), 
we  have  0(5)  =  (0,0,1,0),  r'(D)  =  (0,0,1, 1),  0(A)  =  (0,0, 0,1),  and  r^(5)  =  (1,1, 1,1).  Thus. 
V(  0(5),0(A  )  )  =  2  and  4)(  I~*(f5  ),rk5)  )  =  2.  As  a  result  of  Theorem  4,  5  — >D  and  A  ->fl 
are  two  conflicting  path  connections. 

2.5.  DECOMPOSITION  AND  PARTITIONING 

In  this  section,  compared  to  the  distributed  routing  propeny,  we  study  two  global  func¬ 
tional  properties  on  the  CP^,  i.e.,  its  decomposition  and  partitioning . 

The  partitionability  of  a  network  is  the  ability  to  decompose  the  network  into  indepen¬ 
dent  subnetworks  of  different  sizes  [Sie79].  It  is  desirable  that  each  subnetwork  with  smaller 
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size  can  have  all  the  functional  properties  of  the  original  network.  A  partitionable  network 
allows  a  system  to  be  dynamically  reconfigured  into  independent  subsystems.  It  has  a 
significant  impact  on  the  application  of  parallel  processing. 

In  the  next  theorem,  we  exploit  the  strict  buddy  property  [Agr83]  of  the  class  of  log2jV 
Cpmin  jj  ^  important  auxiliary  for  the  discussion  on  decomposition  and  partitioning. 

Let  Lj  and  ¥2^0.  two  switching  elements  at  a  switching  stage  as  discussed  in  Section  II. 
Y j  and  1^2  output  buddies  if  succ^iY j)  =  succ'^(K2)  and  succ^(Y j)  =  succHY2): 

input  buddies  if  prec^(Y y)  =  prec^iY 2)  and  prec^{Yi)  =  prec^iYi)-  A  MIN  has  the  stnct 
buddy  property  if  and  only  if  at  each  stage  for  each  pair  of  input  buddies  there  exists  another 
pair  of  input  buddies  such  that  they  constitute  two  pairs  of  output  buddies.  Fig.  2  7  illustrates 
the  buddy  propeny. 

» 

THEOREM  4:  The  class  of  n  -stage  CP^  satisfies  the  strict  buddy  property. 

PROOF:  Let  r  be  an  n -stage  CP™"  type  MIN  with  characteristic  functions  O  and  I  as 
we  have  defined.  Even  if  the  function  O  cannot  uniquely  determine  the  topological  structure 
of  r,  without  loss  of  generality,  we  can  construct  F  by  using  only,  say,  the  yt -subshuffle 
If  other  CP(n  )  tv-pe  connection  patterns  are  used,  the  proof  is  similar  to  the  following. 

N 

Let  SW^  j  be  the  j:th  switching  element  at  stage  j  of  F,  where  x  e  [0,  —  -1],  j  e  [0, 

n-1],  and  its  binary  representation  be  .  .  .  ,wi).  Assume  stage  j  is  connected  to  stage 

j +\  by  a  connection  pattern.  The  labels  of  the  successors  at  stage  7  +  I  of  SW^  ^  are  of  the 

form  . vv|,d),  where  d  =  0  or  1.  It  is  obvious  that  we  can  always 

find  a  unique  switching  element  SWy  ^  which  has  the  same  successors  as  SW^  ^  if  and  only  if 
its  label  is  ■  •  •  >^i)’  where  represents  the  binary  complement 


output  buddies 

stage  ;-l  stage  .  stage  ./+! 


*■*  output  buddies 

(a)  (b) 


Fig.  2.7  (a)  The  buddy  propeny  .  (b)  The  Interstage  buddy  property  in  an  MIN. 
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of  w^..  Therefore,  SW^  j  and  SWy  j  are  output  buddies  at  stage  j  and  their  common  succes¬ 
sors  are  input  buddies  at  stage  y  +  1.  Thus,  each  input  buddies  and  output  buddies  can  easily 
be  identified  at  each  stage  of  F. 

Now,  to  show  the  strict  buddy  propeny  of  F,  we  must  verify  that  for  each  pair  of  input 
buddies  at  stage  y  +  1  there  exists  another  pair  of  input  buddies  such  that  they  constitute  two 
pairs  of  output  buddies. 

Let  and  be  two  pairs  of  input  buddies  and  let 

•  •  •  >  and  (^n-l<  ■  ■  •  ,  + 

A-1'  ■  .  ■  .^1,^)  be  their  labels,  where  d  =  0  corresponds  to  SW^  d  =  1 

corresponds  to  SWy  j^^,  Assume  stage  y+1  is  connected  to  stage  y+2  through  a  ct;*] 

connection  pattern.  If  we  let  a,  =  b,  =  c,,  i  e  [0,  /-l]t^[/+l,  /t-l]i^[;l:+l,  n-1]  and  a,  =  b,, 
and  •S'lFy.y+i  will  form  output  buddies  with  common  successors 
.  .  .  , .  .  .  ,Ci,0,d)  at  stage  y+2.  Similarly,  and 

will  form  output  buddies  with  common  successors 
(c„_i,  .  .  .  ,c^+i,Q_i,  .  .  .  ,Q+i,C/_i,  ,  .  .  ,Ci,l,i)  ai  stage  y+2.  Also  it  is  clearly  that  this  is 
the  only  possible  way  to  specify  two  pairs  of  input  buddies  which  also  constitute  two  pairs  of 
output  buddies.  Hence,  at  any  switching  stage,  for  each  input  buddies  the  uniqueness  of 
selecting  another  input  buddies  to  constitute  two  pairs  of  output  buddies  is  verified.  The  argu¬ 
ment  is  also  applicable  to  the  boundary  switching  stages,  stage  0  and  stage  n-l,  by  imagining 
inputs  (outputs)  of  F  to  be  output  (input)  links  of  a  pseudo  switching  stage.  Therefore.  F 
satisfies  the  strict  buddy  propeny.  □ 
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Now,  we  discussion  the  decomposition  property  on  the  class  of  n  -stage  CP^. 

THEOREM  5:  An  n -stage  CP^  type  MIN  F  can  be  decomposed  from  the  view  of  each 
stage  j ,  j  €  [0,  rt-1],  as  follows  (see  Fig.  2.8): 

{Forward)  Stage  j  is  followed  by  two  disjoint  (n-y-l)-stage  subnetworks,  Nq  and  Nj,  such 
that  each  switching  element  on  stage  j  is  connected  to  an  input  of  Nq  (Nj)  via  its  0  (l)-output 
link. 

{Backward)  Stage  j  is  leaded  by  two  disjoint  y -stage  subnetworks.  Mg  and  Mi,  such  that  each 
switching  element  on  stage  j  is  connected  to  an  output  of  Mg  (M^)  via  its  0  (l)-input  link. 

Proof:  As  in  Theorem  6,  without  loss  of  generality,  we  assume  only  subshuffle  connec¬ 
tion  patterns  are  used  on  F.  Also  we  assume  stage  j  is  connected  to  stage  y  +  1  by  <5^  con¬ 
nection  pattern  and  stage  y  +  1  is  connected  to  stage  y+2  by  connection  pattern.  Consider 
the  forward  case  first.  It  is  clear  that,  on  stage  j  and  stage  y+1,  each  0-output  link  at  stage  y 
with  label  (w„_i,...,wi,0)  is  connected  to  a  switching  element  at  stage  y  +  1  with  label 
(w^_i,...,w^^i,Wjt_i,...,wi,0);  and  each  1-output  link  at  stage  y  with  label  (w^_i,...,w i,l)  is  con¬ 
nected  to  a  switching  element  at  stage  y  +  1  with  label  (w„_i,...,w^+i,w^_i,...,wi,l).  Thus,  each 
switching  element  at  stage  y  is  connected  to  a  set  of  2'*"'  switching  elements,  Sg,  with  gen¬ 
eral  label  form  {d,  ...  ,d,0)  at  stage  y+1  via  its  0-output  link.  In  a  similar  way,  each 
switching  element  at  stage  y  is  connected  to  a  set  of  2'’”’  switching  elements,  Sj,  with  gen¬ 
eral  label  form  {d . d,\)  at  stage  y  +  1  via  its  1-output  link. 

Now,  we  want  to  show  that  each  one  of  5g  and  5i  can  span  an  (/i-y-1)- stage  subnet¬ 
work  and.  particularly,  these  two  spanned  subnetworks  are  disjoint.  For  any  switching  ele¬ 
ment  €  5g,  its  corresponding  switching  element 


Fig.  2.3.  The  decomposition  propeny  of  a 
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which  has  the  label  to  form  output  buddies  is  still  an 

element  in  Sq.  There  are  a  total  of  2"“^  pairs  of  output  buddies  in  Sq  which  can  span  a  set  of 
2''~^  switching  elements  consisting  of  2"“^  pairs  of  input  buddies  at  stage  J+2.  Each  output 
buddies  in  Sq  with  label  form  .  .  .  ,jci,0)  spans  input 

buddies  with  label  form  .  .  .  ,Xi,0,d)  at  stage  J+2. 

According  to  Theorem  3  and  the  definition  of  CP(n )  type  connection  patterns,  some  bit  in 

label  .  .  .  .Xk+iJCk^i,  .  .  .  . ati,0,d,d)  is  permuted  to  the  position  of  LSB. 

.  .  .  .-tjk+iA/k-i.  •  •  •  ,X[+iyXi_i,  .  .  .  ,Xi,0M,d)  is  the  label  form  of  output  links  of  the  set 
of  switching  elements  af  stage  J+2  composed  of  input  buddies 
Un-1>  •  ■  •  .  ■  .  ,X[.^^^Xi_i,  .  .  .  ,.Xi,0,d).  Thus,  we  can  have  another  set  of  2"'^ 

switching  elements  spanned  by  Sq  at  stage  J+3  with  the  label  form  (...,0,d,d).  As  we  proved 
in  Theorem  6,  this  is  essentiality  of  the  strict  buddy  property,  because  the  corresponding  input 
buddies  of  each  input  buddies  at  stage  J  +2  as  required  to  satisfy  strict  buddy  propeny  has  the 
same  label  form.  In  general,  at  each  stage  i,  i  e  [j+2,  n-1],  Sq  spans  a  set  of  2""'  switching 
elements  with  general  label  form  (...,0,dM,...,d)  where  the  number  of  d’s  is  equal  to  i.  This 
is  based  on  the  fact  of  the  strict  buddy  property  at  each  stage  of  T.  Hence,  Sq  spans  an 
(n-;-l)-stage  subnetwork  with  2"“^  inputs  and  outputs.  We  denote  it  by  Ng. 

Similarly,  Si  spans  a  set  of  2'*'*  switching  elements  at  each  stage  /,  i  e  [J+2,  n-l], 
with  general  label  form  {...,\,d ,d ,...,d)  where  the  number  of  d  is  equal  to  i.  The  (n-y-1)- 
stage  subnetwork  spanned  by  5i  is  denoted  by  N|.  Obviously,  Ng  and  N,  are  disjoint,  since 
their  switching  elements  at  each  stage  belong  to  two  different  sets. 
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For  the  backward  case,  the  proof  is  similar.  Again,  based  on  the  strict  buddy  propeny  of 
r,  two  disjoint  y -stage  subnetworks,  Mq  and  Mj,  can  be  found.  Each  sw’trhin"  '*'cment  on 
stage  j  is  connected  to  an  output  of  MoCMj)  via  its  0  (l)-input  link. 

The  last  theorem  implies  an  alternative,  recursive  definition  of  if  r  is  decom¬ 

posed  as  viewed  from  stage  0  (stage  n-l),  Nq  and  Nj  (Mq  and  Mj)  will  be  two  (n-l)-stage 
subnetworks.  We  can  prove  that  they  belong  to  the  (n-I)-stage  CP^  and,  thus,  can  be 
decomposed  in  a  recursive  way. 

LEMMA  3:  An  n -stage  CP^  type  MIN  F  can  be  decomposed  as  viewed  from  stage  0 
(stage  n-l)  such  that  stage  0  (stage  n-l)  is  followed  by  two  disjoint  (n-lj-stage  CP"^^  type 
MIN’S,  No  and  Nj  (Mq  and  Mj). 

PROOF:  Without  loss  of  generality,  we  assume  only  subshuffle  connection  patterns  are 
used  on  F  and  a  /t -subshuffle  is  used  at  stage  0.  According  to  Theorem  7,  we  have  No 

and  .Nj,  two  disjoint  (/i-l)-stage  MIN's  connected  to  those  0-output  links  and  1-output  links  at 
stage  0.  We  want  to  prove  that  they  are  two  CP^‘  type  MIN’s. 

As  per  the  proof  in  Theorem  7,  those  input  addresses  associate  with  .Ng  have  the  form 
5-Vo  ^  £)(0)(0)  ^  - So>0) 

N  V  jV 

=  (5n-2 . ^I^5o^O) 

which  is  the  label  form  of  0-output  links  at  stage  0.  Similarly,  those  input  addresses  associate 
with  N,  have  the  form 


=D(0W)  =  . . sg,l) 
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—  ■  >^1  Q  »1) 

which  is  the  label  form  of  1 -output  links  at  stage  0.  We  know  that  F  is  characterized  by 
function  O  such  that 


•  •  •  '-^o)  -  (■^0®(n-l)'  •  •  •  '•^e‘’(l)’'y0‘’(O)) 


~  ('^0® («-!)>  •  •  •  '•^0°(l)’‘yt)- 

Thus,  the  change  of  is  irrelevant  to  any  partial  transition  sequence  starting  from  S^°  (5^') 
on  No  (Ni).  Consequently,  there  exists  a  restricted  function  such  that 

•  •  ■  ’  ^k+l'>^k-\>  •  •  ■ 

. 


-  (>y0O(n-n'  •  •  •  -•^0O(i))- 

That  is  to  say,  according  to  Theorem  3,  Nq  and  N|  belong  to  the  («-l)-stage  CP^. 

Similarly,  we  can  show  that  Mg  and  Mj  are  two  disjoint  (/z-l)-stage  type  MIN’s.D 

In  general,  by  properly  linking  decomposed  subnetworks,  each  MIN  in  the  CP^*”  can  be 
decomposed  recursively  from  the  view  of  each  stage.  We  have  given  the  special  cases  at  the 
first  stage  and  the  last  stage  in  the  above  lemma.  An  efficient  partition  scheme  can  be 
achieved  on  the  CP^  based  on  its  topological  decomposition  property. 

THEOREM  6:  An  n -stage  CP^  type  MIN  F  is  decomposed  to  N^,  N^,  Mq  and  M|  four 
subnetworks  from  the  view  of  stage  k  according  to  Theorem  7.  If  we  force  all  switching 
elements  at  stage  k  to  0-state  or  1 -state,  then  M,-  and  Ny,  ij  e  [0,  1],  can  be  linked  into  an 
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(/i-l)-stage  CP^  type  MIN,  denoted  by 

PROOF:  This  proof  is  similar  to  the  last  one.  By  forcing  all  the  switching  elements  at 
stage  k  to  0-state  (i.e.  is  replaced  by  in  on  any  transition  sequence),  we 

can  link  Mq  and  Nq  (Mj  and  Nj)  to  form  an  («-l)-stage  MIN.  They  are  denoted  as  Jqq  and 
Jij,  respectively.  Similarly,  two  (n-l)-stage  MIN’s,  Jqi  and  Jjo,  can  be  formed  by  forcing  all 
the  switching  elements  at  stage  k  to  1 -state  (i.e.  ^90^^)  is  replaced  by  590  in  on  any 

transition  sequence).  The  change  of  S90(jfe)  is  irrelevant  to  any  joined  partial  transition 
sequence  on  Joq,  Ju,  Jqi  and  Jiq.  Therefore,  ail  the  ’s  arc  (n-l)-stage  CP^  type  MIN.  □ 

LEMMA  4:  An  n -stage  CP^  type  MIN  F  can  be  panitioned  into  two  (n-l)-stage 
CP^  type  MIN’s  by  forcing  all  the  switching  elements  at  stage  i  to  0-state  or  1-stage  such 
that,  in  each  (n-l)-stage  MIN,  all  the  input  addresses  agree  in  bit  0(5,  )  and  all  the  output 
addresses  agree  in  bit  r*(d,  ). 

PROOF:  It  is  very  easy  to  give  the  proof  on  a  transition  sequence  using  our  path- 
descriptive  methodology.  To  force  all  switching  elements  at  stage  i  to  0- state  (1 -state)  is 
equivalent  to  forcing  the  LSB  0(5,  )  or  590(,)  of  Z)(,)  to  be  replaced  by  (  590(,))  as  the 

result  in  on  each  transition  sequence  on  F.  First,  we  consider  the  0-stage  case.  As  per 

the  nature  of  the  routing  scheme  on  F,  at  stage  i,  any  message  is  routed  from  0(5i)-input  link 
to  )-output  link.  That  is  to  say,  any  input  S  with  bit  0(5,  )  can  communicate  with  any 
output  D  with  bit  F'(d,)  =  0(5,).  Obviously,  we  have  divided  F  to  two  subnetworks. 
According  to  Theorem  7  and  Theorem  8,  0(5,)  =  F'(ci,  )  =  0  is  associated  with  the  [n-D- 
stage  CP'"'^  type  -MIN  Jqq  and  0(5,)  =  Fkdj)  =  1,  with  the  (n-l)-stage  CP^  type  MIN  Ju. 
Similarly,  for  the  1-state  case,  any  input  S  with  bit  0(5,  )  =0(1)  can  communicate  with  any 


)rk  on  the  MIN  shown  in  Fig.  2.5  is  formed 
the  switching  elements  at  stage  0. 
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Fig.  2.10.  One  of  ihe  two  2-siage  subnetworks  on  j,^  shown  in  Fig.  2.5.  is  formed 
by  forcing  all  the  switching  elements  at  stage  2. 
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output  D  with  bit  I  =  1  (0)  such  that  the  joined  (n-l)-stage  CP^  type  MIN  Jqi  (Jiq)  is 
formed.  □ 

For  example,  by  forcing  all  the  switching  elements  at  stage  0  on  the  MIN  shown  in  Fig. 
5  to  0-state,  we  have  one  of  the  two  8x8  3-stage  subnetworks,  Jqq,  shown  boldly  in  Fig.  2.9. 
For  subnetwork  Jqq,  all  the  input  addresses  agree  in  bit  0(5 q)  =  53  =  0  and  all  the  output 
addresses  agree  in  bit  =  ^2  ~  The  general  form  of  transition  sequences  on  Joq  is  as 

follows: 

Similarly,  by  forcing  all  the  switching  elements  at  stage  2  on  J(X),  we  have  one  of  the  two 
4x4  2-stage  subnetworks  shown  in  Fig.  2.10.  All  the  input  addresses  agree  in  bit  0(5 0)  =  53 
and  0(53)  =  ^2’  output  addresses  agree  in  bit  ~  ^2  ^o- 

not  every  Banyan  type  MIN  can  be  partitioned.  For  example,  the  MIN  in  Fig.  6(b)  is  not  par- 
titionable. 

2.5.  SUMMARY 

In  this  chapter,  we  propose  a  class  of  Banyan  type  MINs  defined  by  CP(n)  type  connec¬ 
tion  patterns,  denoted  as  log2N  CP^.  This  class  includes  all  the  famous  MINs  presented 
previously  in  the  literature  as  special  cases.  We  show  that  the  topological  structure  of  each 
network  in  this  class  can  be  specified  by  two  characteristic  functions  and  particularly  the  topo¬ 
logical  equivalence  among  networks  can  be  interpreted  as  linea:  transformations  on  charac¬ 
teristic  functions.  Based  on  characteristic  functions,  their  topology-related  functional 
behavior,  such  as  the  simple  bit-directed  routing  scheme,  has  been  discussed.  Actually,  our 
methodology  can  easily  be  extended  to  all  Banyan  type  MINs,  where,  in  general,  their  charac¬ 
teristic  functions  are  two-dimensional  matrices  rather  than  one-dimensional  permutation 
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functions.  The  proposed  approach  also  provides  a  good  description  for  the  closed  form  of  all 

the  passable  permutations  on  various  MINs,  the  condition  for  conflict-free  multiple  paths  and 
the  network  panitioning. 
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CHAPTER  3 


TRANSFORMATION  RULES  FOR 
MULTISTAGE  INTERCONNECTION  NETWORKS 


3.1.  INTRODUCTION 

The  general  concept  of  parallel  supercomputers  is  to  employ  a  communication  network 
to  interconnect  a  large  number  of  processors  and  a  large  number  of  memory  modules  in  a  way 
that  processors  can  communicate  to  others  and  memory  modules  can  be  simultaneously 
accessed  without  conflict.  Although  the  crossbar  network  does  provide  such  a  capability,  it  is 
not  economically  practical  when  the  number  of  processors  and  modules  becomes  large.  A 
realistic  alternative  is  to  use  multistage  interconnection  networks  (MINs).  Typically,  they  are 

N 

designed  using  log2A^  stages  of  —  2x2  switching  elements  and  (log2A^+l)  fixed  connection 

patterns  to  connect  N  inputs  to  N  outputs  such  that  only  the  minimum  number  of  switching 
elements  are  required  to  provide  full  access  capability  from  all  the  inputs  to  all  the  outputs 
and  a  unique  communication  path  from  any  input  to  any  output.  Examples  are  the  Omega 
network  [Law75],  Baseline  and  Reverse  Baseline  network  [WuFeSO],  Indirect  Binary  Cube 
network  [Pea77],  Delta  network  [PatSl],  Cube  network  [SiSm78],  etc.  Since  the  capability  of 
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each  network  in  terms  of  non-conflict  permutation  communications  are  different,  different  net¬ 
works  can  be  selected  to  efficiently  suppon  different  application  needs.  For  example,  in 
[Law75],  the  Omega  network  is  selected  for  supporting  matrix  computations  on  an  array  pro¬ 
cessor.  An  important  question  arises  as  to  whether  we  can  reuse  the  algorithms/software 
(which  have  been  developed  on  a  system  using  a  MIN)  on  another  system  which  employs  a 
different  MIN.  It  is  well  known  that  as  long  as  two  networks  are  equivalent,  there  exists, 
theoretically,  a  one-to-one  mapping  function  between  them  such  that  by  relabeling  the  proces¬ 
sors  and  memories  and  loading  data  into  the  memories  based  on  the  new  labels,  one  network 
can  simulate  the  other.  For  example,  in  [WuFeSO],  it  is  shown  that  several  networks  are  func¬ 
tionally  equivalent  to  the  Baseline  network.  So  far,  many  theoretical  studies  have  been  per¬ 
formed.  For  example,  in  [Agr83],  it  is  shown  that  networks  with  full  connection  and  strict 
buddy  property  are  functionally  equivalent;  in  [BeFo88],  necessary  condition  for  networks  to 
be  equivalent  are  also  established.  In  general,  their  studies  are  still  at  a  very  abstract  level 
such  that  transformation  rules  between  networks  are  still  not  available.  In  fact,  it  is  extremely 
difficult  (if  not  impossible)  to  derive  those  transformation  rules  from  those  theoretical  studies. 
Therefore,  all  the  research  works  in  the  area  of  network  equivalence  are  still  very  far  away 
from  the  practical  applications.  In  this  chapter,  we  will  present  the  mapping  functions 
between  equivalent  networks  in  a  concise  way  such  that,  for  the  first  time,  the  equivalence  can 
be  fully  utilized  in  the  design  of  supercomputers. 
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3.2.  BIT-PERMUTE-COMPLEMENT  MULTISTAGE  INTERCONNEC¬ 
TION  NETWORKS 

In  this  section,  a  rather  general  class  of  equivalent  networks  based  on  bit-permute- 
complement  shuffles  is  outlined,  which  includes  all  the  above  mentioned  networks  as  special 
cases  and  requires  a  very  simple  destination  address  based  routing  scheme.  The  connection 
patterns  of  this  class  of  networks  are  defined  by  the  family  of  Bit -Permute -Complement 
(BPC)  type  permutations,  denoted  by  { (3„  )  and  defined  as  follows. 

DEFINITION  1:  Let  I  =  .  .  .  ,/i,/o)  be  a  number  in  [0,1,  .  .  .  ,.V-1}.  A  permuta¬ 

tion  (3  e  (p„}  is  specified  by  an  n-tuple  vector  0  =  (X„_^6(n-1),  .  .  .  , A.i6(1),A.q6(0)),  where 
(0(n-l),  .  .  .  ,0(l),0(O))  is  a  permutation  of  (/i— 1,  .  .  .  ,  1,0)  and  X,  e  [-1,  1},  0  <  /  <  n  -  1, 
such  that  .  .  .  ,mi,mo),  where  m,-  =  /e(,)  if  X,-  =1,  else  mi=l- 

=-!•  □ 

In  other  words,  p(/)  is  obtained  from  /  by  first  permuting  the  bits  of  the  binary  represen¬ 
tation  of  /  and  then  complementing  a  subset  of  bits  according  to  the  vector  V.  For  example, 
the  bit -reversal  permutation  p  is  one  of  the  BPC  type  permutations  and  p(/)  = 

C/q./i.  ■  •  ■  where  p  =  (0,1 . n  -  1).  Another  example  is  the  perfect  shuffle 

permutation  with  ^(/)  =  (/„_2,  •  •  ■  ,loJn-0-  where  ^  =  (n  -  2,  •  •  •  ,0,n  -  1).  Any  connec¬ 
tion  pattern  defined  by  a  permutation  P  has  a  link  connecting  /  th  output  port  of  one  switching 
stage  to  P(/)th  input  port  of  the  next  switching  stage,  for  all  0  <  /  <  2"  -  1.  Similarly,  the 
inverse  function  P"'  is  defined  as  follows. 

DEFINITION  2:  Let  P“  be  the  inverse  function  of  P  e  {p„  }•  The  function  P“'  6  {p„  ] 
is  specified  by  an  n -tuple  vector  p”*  =  (X9-i(„_i)0~*(n-l) . X0-i(j)0~'(l),X0-i(o^0“’(O)) 
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where  0~'  is  the  inverse  function  of  0.  □ 

Let  a  and  P  be  two  permutation  functions  in  { p„ }  and  specified  by  vectors  6  and  P, 
respectively.  The  composition  of  a  and  P  is  denoted  as  a-p  such  that  a  p(/)  =  P(a(/))  (i.e., 
the  composed  functions  are  performed  from  left  to  right)  and  is  specified  by  a  vector  &  p.  In 
the  following,  the  notation  a-p  and  aP  will  be  used  interchangeably.  For  example,  consider  n 
=  4  and  let  P  be  specified  by  p  =  (-2,- 1,0,3)  and  /  =  (1,0,0, 1).  We  have  m3  =  1  -  (2,  t7i2  =  I 
-  /j,  mi  =  /q  and  mg  =  ^3.  Hence,  P(/)  =  (1,1, 1,1).  Similarly,  it  is  easy  to  obtain  that  p'^  = 
(0,-3,-2,l)  and  P~^(/)  =  (/q,  1  -  (3,  1  -  U,  l\)  =  (1,0,1,0).  If  a  is  specified  by  a  =  (-l,2,3,-0), 
then  ap(/)  =  P(a(/))  =  P(  1,0, 1,0)  =  ( 1,0,0, 1)  and  the  composition  ap  is  specified  by  the  vec¬ 
tor  &-p  =  (-l,2,3,-0)-(-2,- 1,0,3)  =  (-2,-3,-0,-l). 

It  is  clear  that  an  NyN  (i.e.,  N  inputs  and  N  outputs)  MIN  (see  Fig.  3.1.)  can  be 
represented  by  a  sequence  of  BPC  permutation  and  exchange  operations 

pQ.£Q.pl  .  .  .  p(n-\),£{n-{),pn 

where  /i=log2N,  each  F'  (0<i<n)  represents  a  BPC  peimutation  operation,  and  each  E' 
(0^’</z-l)  denotes  an  exchange  operation.  That  is.  the  /th  connection  pattern  corresponds  to 
the  BPC  permutation  operation  and  the  /th  switching  stage  corresponds  to  the  exchange 
operation  £‘.  Each  E'  represents  a  BPC  permutation  operation  such  that  the  least  significant 
bit  (LSB)  of  its  operand  is  replaced  by  another  bit  which  is  either  the  original  bit  or  its  com¬ 
plement,  i.e.,  E‘  is  specified  by  a  vector  either  (n-l,n-2,  .  .  .  ,  1,0)  or  {n-\,n-l . 1,-0). 

In  other  words,  E‘f/„_i,  .  .  .  ./i./q)  =  Assume  that  the 

addresses  of  both  network  inputs  and  outputs  are  numbered  from  0  to  .V-1  following  a  natural 
order  in  the  drawing.  (Note  that  each  address  can  be  represented  by  an  n-bit  binary  number.) 
Consider  a  communication  connection  to  be  established  from  a  network  input  with  address  S 
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=  (^n-1'  •  •  •  --^ivyo)  ^  network  output  with  address  D  =  .  .  .  ,di,dQ).  During  the 

propagation,  the  source  address  bits  (i.e.,  r,  ’s  or  Jj  ’s,  the  complement  of  j,  ’s)  are  replaced  by 
destination  bits  U.e.,  d^  ’s  or  d;  ’s)  one  by  one  at  each  switching  stage.  The  orders  for  source 
address  bits  to  be  removed  and  for  destination  bits  to  be  introduced  are  determined  by  the 
BPC  permutation  operations  (i.e.,  P‘ ’s).  In  principle,  during  each  BPC  permutation  operation, 
a  source  address  bit  (or  its  complement)  is  moved  to  the  position  of  LSB  and,  at  the  next 
switching  stage  of  exchange  operation,  is  replaced  by  a  destination  bit  (or  its  complement)  at 
the  same  bit  location. 

For  example,  the  BPC  permutation  operations  used  in  the  Omega  network  are  perfect 
shuffle  connections  (which  are  left  rotation  operations).  Any  path  connecting  a  network  input 
5  =  (s„_i,  .  .  .  ,Si,So)  ^  network  output  D  -  .  .  .  ,di,do)  on  an  Omega  network,  can 

be  uniquely  expressed  by  the  following  transition  sequence: 


•  ■  I^Sq) 

S  =  (Sn-2i^n-2’  ■  ■ 

S  =  (5'n_3,J'„_4,  .  . 

D  ^  =  (5„_3,J„_4,  . 

•  ■  '•^O.^n-l’^n-2) 

~  ^^n-2-i  ■’^n-3-i 

,  .  .  .  ,  S  Qtdfi  _  [ ,  .  . 

•  >  dfi—l 

^  ~  ^^n-2-i  ’^n-3- 

s  .  • 
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S'-'  =  . 

•  ■  ,di,SQ) 

D-'  =  . 

■  ■  >di,dQ) 

S'*  =  (dn-i4n-2>  ■  ■ 

.  ,di,d(4 

=  D. 

In  the  transition  sequence,  each  number  S‘,  0  <  /  <  n  -  1,  is  the  address  of  the  input 
pon  through  which  the  path  traverses  stage  i  and  each  number  D\  0  <  i  <  n  -  1,  the  output 
pon  through  which  the  path  traverses  stage  /.  Obviously,  either  or  (i.e., 

the  first  n-l  bits  of  the  binary  representation  of  5‘  or  Z)')  is  the  address  of  the  switching  ele¬ 
ment  through  which  the  path  traverses  stage  i.  Obviously,  source  bit  is  moved  to  the 

LSB  position  of  S'  at  stage  i  and  is  replaced  with  the  destination  bit  of  D  in  the 

LSB  position  of  D ' .  Note  that  in  the  transition  sequence  we  have  the  following  relations: 
P^(S)  =  S®,  £‘(S')  =  D‘  for  all  0  <  I  <  n-l,  and  =  S'  for  all  1  <  i  <  n.  There¬ 

fore,  the  order  for  the  source  bits  to  be  removed  is  n-l,n-2,n-3,  .  .  .  ,2,1,0  and  the  order  for 
the  destination  bits  to  be  introduced  is  also  n—l,n—2,n-3,  .  .  .  ,2,1,0.  Here,  two  vectors  O  = 
(0,1,2  n-2,n-l)  and  /  =  (0,1,2  n-2,n—l)  which  correspond  to  two  permutation 

functions  O  and  in  {p„),  are  used  to  represent  these  two  sequences,  respectively.  Note 
that  by  using 

6  =  (  d(n-\),  o(n-2)  .  .  .  .  ,  d(2\  (5(1),  3(0)  ) 
and  r'  =  (  r^(n-i),  r\n-2) ,  —  r\2),  r\i),  r\o) ) 

we  mean  that  at  stage  J  source  bit  will  be  removed  and  replaced  by  destination  bit 
Moreover,  the  meaning  of  the  permutation  function  /  is  as  follows:  if  bit  s.^j^  is 
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replaced  by  bit  dj  at  stage  j,  then  the  order  of  bits  of  1(D)  represents  the  disturbed  order  of 
bits  of  D .  For  the  Or.iega  network,  we  have  I~^(D)  =  1(D).  Similarly,  the  O  and  for  the 
Baseline  network  [WuFeSO]  are  given  as  follows: 

d  =  (rt-1,/1-2,  .  .  .  ,  1,0) 

=  (0,1,  .  .  .  ,n-2,n-l). 

It  is  clear  that  there  are  a  huge  number  of  MINs  in  this  class  which  are  constructed  by 
the  BPC  permutation  connections  and  possess  the  unique-path  and  full-access  properties. 
Here  we  use  the  term  bit -permute -complement  MINs  to  represent  this  class.  Generally 
speaking,  the  bit-permute-complement  MINs  are  a  class  of  topologically  equivalent  networks 
which  have  the  similar  routing  behavior  and  thus,  the  similar  expression  of  transition 
sequences  like  Omega  networks.  This  class  of  MINs  includes  the  six  networks  mentioned  in 
[WuFeSO]  as  special  cases.  The  connection  patterns  used  between  stages  of  them  are  a 
specified  set  from  ).  Their  transition  sequence  which  represents  any  path  connecting  net¬ 
work  input  5  =  to  network  output  D  =  .  .  ,  , has  the  following 

properties. 

(1)  Each  bit  of  the  sources  5  (or  its  complement)  will  be  permuted  to  the  position  of  LSB  in 
some  S  ‘  and  then  be  replaced  by  a  bit  of  the  destination  D  (or  its  complement)  in  D ' ,  0  <  / 
<  n  -  \.  Therefore,  there  exist  two  permutation  functions  O  ,  I~^  e  {P^}  such  that  0(S) 
corresponds  to  the  order  for  bits  of  S  to  be  permuted  to  the  position  of  LSB  (i.e,  [0(S)],  is 
the  LSB  in  S‘)  and  l~^(D)  corresponds  to  the  order  for  bits  of  D  to  replace  bits  of  5  (i.e.. 
[/“'(D)],  replaces  [0(S)],  of  D‘).  The  physical  meaning  of  permutation  function  /  is  as  fol¬ 
lows:  if  the  iih  bit  of  a  number  X  =  ■  ■  ■  .-’^i»a:o)  instead  of  bit  [/“^(D)],  replaces 

[(9(5)],  in  D‘,  then  I(X)  represents  the  final  desdnation  where  the  source  5  will  reach. 
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(2)  The  data  is  routed  from  input  port  [(9(5)],  to  output  port  [/  ^(D)],  of  the  switching  ele¬ 
ment  at  stage  ;  and  the  address  of  the  switching  element  is  either  [5‘]„_i.i  or 

(3)  The  routing  scheme  of  this  class  of  MINs  can  be  described  as  follows.  Let  the  symbol 

@  represent  the  exclusive-or  operation.  Bit  is  used  as  the  routing  tag  for  the 

switching  element  at  stage  i  such  that  the  data  is  routed  from  input  port  [(9(5)],  to  output 
port  [/"^(D)],  .  Bit  [(9(5)],-  @  [/"'(£>)],  is  used  to  determine  the  state  of  the  switching  ele¬ 
ment  at  stage  i  if  global  routing  is  considered  (i.e.,  if  [(9(5)],  @  [/~Vf5)]j  =  0  then  the 
switching  element  will  be  in  a  straight  connection  state  (i.e.,  0  state),  else  the  switching  ele¬ 
ment  will  be  in  a  exchange  connection  (i.e.,  1  state)).  After  the  path  traverses  stage  i,  bit 
[(9(5)],-  (i.e.,  the  label  of  the  input  port  from  which  the  incoming  data  comes)  is  preserved  to 
recover  the  information  of  the  source  address.  We  call  this  kind  of  routing  scheme  as  the 
source  -preserved  and  destination-oriented  routing  scheme.  Note  that  not  all  the  .MINs  with 
full  access  capability  and  unique-path  property  possess  this  kind  of  simple  routing  scheme. 

Functions  (9  and  I  are  referred  to  as  characteristic  functions.  It  can  be  shown  that  it  is 
sufficient  to  characterize  the  structure  and  routing  behavior  of  any  MIN  in  the  class  of  bit- 
permute-complement  MINs  by  using  these  two  functions.  Note  that  given  any  two  permuta¬ 
tion  functions  (9  and  /  in  {P„},  there  are  many  different  corresponding  sequences  of  BPC 
permutation  operations  (F'^  P^,  ...,  P'*~^,  F")  such  that  the  MIN  constructed  by  any  one  of 
them  can  be  characterized  by  functions  (9  and  I .  Each  corresponding  sequence  of  BPC  per¬ 
mutation  operations  represents  a  drawing  of  a  MIN.  It  can  be  proved  that  there  are  totally 
[2'’"'  'n-I)!]"  different  drawings  (i.e.,  MINs)  specified  by  the  same  pair  of  characteristic 
functions.  It  can  also  be  shown  that  the  permutation  capability  of  any  MIN  in  this  class  is 
uniquely  characterized  by  these  two  functions.  Thus,  we  say  two  MINs  in  the  class  of  bit- 
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permute-complemem  MINs  are  functionally  equivalent  if  their  characteristic  functions  are  the 
same. 

For  example,  in  Fig.  3.2,  a  16x16  bit-permute-complement  MIN  defined  by  a  sequence 
of  BPC  permutation  operations  is  shown.  Let  the  characteristic  functions  of  this  network,  O , 
/  and  r^,  be  specified  by  vectors  6 ,  /  and  /~\  respectively.  Let  the  connection  pattern  P\ 
0  <  i  <  n,  he  specified  by  vector  P‘ .  We  have  P^  =  (2,-l,-0,3),  P^  =  (2,3,0,-l),  P^  =  (- 
0,l,3,-2),  P  =  (2,-0,3,l),  P  =  (1,3,0,2),  6  =  (-l,-2,0,3),  /  =  (-l,0,3,-2),  and  P  =  (l,-0,-3,2). 
For  any  path  connecting  a  source  S  to  a  destination  D,  bit  ^2  ^^ed  as  the  routing  tag  at 
stage  0,  bit  <^3  is  used  as  the  routing  tag  at  stage  1,  bit  do  is  used  as  the  routing  tag  at  stage  2, 
and  bit  is  used  as  the  routing  tag  at  stage  3.  The  states  of  switching  elements  from  stage  0 
to  stage  3  are  determined  by  ^3  @  ^2,  Sq@  d^,  ^o»  ^1  @  respectively. 

Hence,  for  the  path  connecting  S  =  \  to  D  =  12,  the  routing  tags  are  (dj,  dg,  d^3,  d2)  = 
(0.1,0,1)  and  states  are  (Ti  @  di,S2@  dQ,  5o  @  ^3.  @  di)  =  (1,0,1, 1). 

3.3.  NETWORK  TRANSFORMATION  RULES 

The  transformation  rules  in  order  to  transform  a  MIN  (which  is  characterized  by  two 
functions  and  /j  and  denoted  as  MINj)  into  another  MIN  (which  is  characterized  by 
another  two  functions  O  2  and  / 2  and  denoted  as  MIN2)  is  discussed  in  this  section. 

THEOREM  1:  By  adding  two  fixed  connection  patterns  a  and  (3  at  the  input  and  output 
sides,  respectively,  of  a  MIN  (which  is  characterized  by  two  functions  O  ^  and  /j  and  denoted 
as  MIN,)  which  are  defined  as  follows: 

a  =  O20 1 ' 


I?;  a<  ^  H 
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Fig.  3.2.  A  16  X  16  bit-permute-complemem  MIN  defined  by  a  sequence  of 

BPC  permutation  operations. 
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then  the  resultant  MIN  becomes  a  MIN2  characterized  by  another  two  functions  O2  and  Ii- 

PROOF:  (see  Fig.  3.3)  Two  fixed  connection  patterns  a  and  (3  are  added  to  the  input  and 
output  sides  of  MENj.  Let  us  call  it  the  new  MIN^.  As  discussed  in  previous  sections,  for 
the  transition  sequence  representing  the  path  connecting  a  source  5  to  a  destination  D  in  the 
new  MINj,  a  (9i(5)  corresponds  to  the  order  for  bits  of  5  to  be  permuted  to  the  position  of 
LSB  (i.e,  [a  (9i(S)],  is  the  LSB  in  5‘)  and  /i  P(D)  corresponds  to  the  disturbed  order  for  bits 
of  D,  if  we  use  bit  to  replace  [a(9j(5)],  in  £)‘.  Obviously,  the  new  MIN^  can  be  charac¬ 
terized  by  two  permutation  functions;  a  and  /\  (3.  Since  a  =  On  O  and  S3  =  /  /2,  we 

have  a<9i  =  O  O  ^  =  O2  and  /fP  =  =  02-  That  is,  the  new  MIN^  is 

equivalent  to  MIN.,.  □ 

A  renumbering  scheme  instead  of  using  connection  patterns  can  transform  a  MIN  to 
another.  Thus,  another  way  to  describe  Theorem  1  is  as  follow: 

THEOREM  2:  By  renumbering  the  addresses  of  network  inputs  and  outputs  of  a  .VIIN 
(which  is  characterized  by  two  functions  0\  and  l\  and  denoted  as  MIN^  in  the  following 
way; 

the  new  address  of  network  input  5  =  O  yO  2^  iS) 
the  new  address  of  network  output  D  =  /  f '  )> 

then  the  resultant  MIN  becomes  a  MIN2  which  is  characterized  by  another  two  functions  O2 
and  /  2. 

Proof:  it  is  clear  that  if  we  renumber  the  address  of  network  input  S  with  the  new 
address  a“^5 ;  =  (9;  Or‘(5)  and  the  address  of  network  output  D  with  the  new  addre.ss 
P(D  )  =  /  f  ’  I ),  then  it  is  equivalent  to  adding  two  connection  panems  a  and  P  at  the 


Fig.  3.3.  Transformation  between  .MINs, 
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input  and  output  sides  of  MINj.  Thus,  as  mentioned  in  Theorem  1,  the  resultant  MIN 

becomes  a  MIN2  which  is  characterized  by  two  functions  O2  and  1 2-  □ 

The  new  MIN^  generated  by  applying  Theorem  1  or  Theorem  2  is  functionally 
equivalent  to  MIN2  except  drawing.  Both  MINj  and  MIN.,  follow  the  same  routing  scheme, 
i.e.,  they  use  the  same  routing  tag  /J^(D)  for  connecting  the  source  S  to  the  destination  D 
and  bit  [02(S)],  @  ^  (^)],-  to  control  the  state  of  the  switching  element  at  stage  i  if  global 

routing  is  considered.  For  example,  consider  the  case  where  MIN^  is  a  16  x  16  Omega  net¬ 
work  and  MIN-,  is  the  16  x  16  bit-permute-complement  MIN  shown  in  Fig.  3.2.  As  men¬ 
tioned  above,  their  characteristic  functions  Oi,  /j,  O2,  and  1 2  can  be  specified  by  the  follow¬ 
ing  vectors;  6^  =  (0,1, 2,3),  =  (0,1,2, 3),  /j  =  (0,1,2,3),  =  (0,1,2,3),  Oj  =  (-l,-2,0,3), 

62^  =  (0,-2,-3  1),  1 2  -  ('l,0,3,-2),  and  =  (l,-0,-3,2).  From  Theorem  1,  if  we  add  two 
connection  patterns  a  and  (3  at  input  and  output  sides  of  MIN^,  such  that 

d  =  62  61^  =  (-l,-2,0,3)  •  (0,1,2,3)-^  =  (-1,-2, 0,3)  •  (0,1, 2,3)  =  (3,0,-2,-l) 

P  =  /f'  [2  =  (0, 1,2,3)-'  •  (-l,-2,0,3)  =  (0,1,2,3)  •  (-l,-2,0,3)  =  (-2,3,0,-!), 

then  the  resultant  MINj  which  is  functionally  equivalent  (except  that  the  arrangement  of  posi¬ 
tions  of  switching  elements  is  different)  to  MIN2  is  shown  in  Fig.  3.4. 

However,  if  we  apply  Theorem  2,  then  the  network  inputs  and  outputs  are  renumbered 
according  to  functions  a"'  and  p,  respectively.  That  is, 

=  dj  O."'  =  (0,1,2,3)  •  (-l,-2,n.3)-'  =  (0,1,2,3)  •  (0,-2,-?.l)  -  (3.-1.- 

0.2) 


P  =  (-2,3.0,-!) 
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Omega  network 
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Fig. 3  4  Transformauon  trom  a  16  x  16  Omega  neiworK.  w  a  bit-permute-compiemem 

.\IIN  shown  in  Fig.  3.2. 


65 


Baseline  network 


Fie.  3.5.  T.-ansformacion  from  a  16  x  16  Baseline  network  to  a  16  x  16  Omega  network. 
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Thus,  the  new  address  of  network  input  S  =  (53,52*^  1*^0)  is  a" ^(5)  =  (53,5  2.^0-^ 2)  "he 
new  address  of  network  output  D  =  {d-^,d24\,dQ)  is  3(Z)  )  =  (^2,^3,^ 0,^2). 

Consider  another  example  where  MIN^  is  a  16x16  Baseline  network  and  MIN-,  is  a 
16x16  Omega  network.  Again,  we  can  derive  the  characteristic  functions  C>2>  ^2 

as  follows;  Oj  =  (3,2,1,0),  Oi~‘  =  (3,2,1,0),  /j  =  (0.1,2,3),  =  (0,1,2,3).  O.  =  (0,1.2,3), 

O2'  =  (0,1, 2, 3),  1 2  =  (0,1, 2,3),  and  1 2^  =  (0,1,2,3).  From  Theorem  1,  if  we  add  two  connec¬ 
tion  patterns  a  and  P  at  input  and  output  sides  of  MINj,  such  that 

dt  =  (92 Of'  =  (0,1, 2,3)  •  (3,2,l,0)-‘  =  (0,1,2,3)  •  (3,2,1, 0)  =  (0,1, 2,3) 
p  =  /,-»  -A  =  (0, 1,2,3)-^  ■  (0,1,2,3)  =  (0, 1,2,3)  •  (0,1,2,3)  =  (3.2,1,0), 

then  the  resultant  .VUN  which  is  functionally  equivalent  (except  that  the  arrangement  of  posi¬ 
tions  of  switching  elements  is  different)  to  MIN-,  is  shown  in  Fig.  3.5. 

However,  if  we  apply  Theorem  2,  then  the  network  inputs  and  outputs  are  .-enumbered 
according  to  functions  a~'  and  p,  respectively.  That  is, 

(i"'  =  dyOi^  =  (3, 2,1,0)  •  (0,l,2,3)-‘  =  (3,2,1,0)  (0,1,2,3)  =  (0.1.2,3) 

P  =  (3,2, 1,0) 

Thus,  the  new  address  of  network  input  S  =  (j-3,j2,-y iAq)  is  a'’(5)  =  (so.S2,S2,S3)  and  the 
new  address  of  network  output  D  =  (d2,d2,d\,dQ)  is  P(£> )  =  (d2,d2'd\,dQ). 

The  next  theorem  describes  the  relative  positions  of  each  switching  element  before  and 
after  a  network  transformation.  Consider  the  ca.se  where  .VHN^  is  transformed  to  MIN-,.  The 
method  to  find  the  relative  positions  is  based  on  the  criterion  that  two  paths  connecting  the 
same  source  and  the  same  destination  of  both  MINj  and  MIN-,  pass  the  same  switching  ele- 
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ment  at  each  stage.  Thus,  if  the  address  of  each  switching  element  of  MIN  ^  is  replaced  by  its 
relative  address  in  MrN2  or  if  the  address  of  each  switching  element  of  MIN-,  is  replaced  by 
its  relative  address  in  MINp  then  both  MIN^  and  MIN-,  become  the  same  drawing.  Let 
MINj  be  defined  by  a  sequence  of  BPC  operations  (Oq,  Oj,  ....  and  .MIN-,, 

4^1,  ...,  for  all  O,-,  €  {p„}.  Thus,  the  transformed  MIN^  is  defined  by  the 

sequence  of  BPC  operations  (a  dJQ,  <t>i,  ....  0„_i.  0„  P). 

THEOREM  3:  By  applying  Theorem  1  or  Theorem  2,  consider  the  case  where  MEN^  is 
transformed  to  MIN2.  Let  5W|[i,  j]  represent  the  y'th  switching  element  at  stage  i  of  the 
transformed  MIN^  and  SW2[i* ,  j*]  represent  the  y’th  switching  element  at  stage  i’  of  MIN-, 
where  0  <  i,  i*  <  n-l  and  0  <  y ,  y*  <  iV/2-1.  The  following  relation  transforms  both  MINj 

and  MIN-,  to  the  same  drawing: 

.  * 

I  =  i , 

r  =  [Of'-O-Ji  •  •  •  ■  •  •  'P.(2y)J„_i;i, 

or  y  =  )]n-v.v 

PROOF:  It  is  clear  that  •  -  •  Oo^  a“k2y  )  or  ■  ■  •  Og  ^•a'‘(2y +  1) 

represent  two  network  inputs  of  MIN^  which  5VTi[i,  y]  connected  to.  However,  according  to 
the  criterion  that  two  paths  connecting  the  same  source  and  the  same  destination  of  both 
MINj  and  MIN2  pass  the  same  switching  element  at  each  stage,  these  two  network  inputs 
should  pass  the  same  switching  element  at  stage  i  of  MIN-,.  Therefore,  y*  = 
■  ■  •  Og •  ■  •  'P,  (2y  is  the  relative  address  of  SIL,!/.  y]  at  stage  i 

of  .VIIN-,.  Similarly,  y  =  '  ^0 relative 

address  of  SVV'2[i ,  y  *  1  at  stage  i  of  the  transformed  MlNj. 
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□ 

For  example,  to  know  the  relative  position  of  each  switching  element  of  the  transformed 
MINj  in  Fig.  3.4  (which  is  transformed  from  an  Omega  network)  with  respect  to  that  of 
MIN2  in  Fig.  3.2,  we  only  have  to  check  their  transition  sequences. 

For  MIN2  in  Fig.  3.2,  the  transition  sequence  is: 


5 

=  (53,  5  2’ 

5l’ 

So) 

II 

■fb’ 

^■3) 

=  (.^2’  1’ 

■^0’ 

d'}) 

(5  |,  53, 

^2’ 

^o) 

—  (5  j,  52’ 

di' 

Ji) 

=  (dj,  d2t 

Su 

S2) 

=  (<^3,  dj 

> 

’  do) 

=  ((^2,  dg, 

^3 

,  Ti) 

=  (^2,  dg 

,d: 

S’  ^1) 

=  (^3,  ^2, 

d, 

’  do) 

=  D. 

For  the  transformed  MIN^  in  Fig.  3.4,  the  transition  sequence  is: 
5  =  (53,  52’  •^1’  •^0) 

S  =  (5  Q,  5  2’  1’  3) 


10 


70 


= 

^2^ 

d^) 

=  (^2, 

Ti, 

d2. 

Sq) 

=  (T2, 

Tl, 

di. 

d3) 

=  (^1, 

d2. 

43. 

■^’2) 

=  (51. 

d-i^ 

.  d^, 

.  do) 

=  (42> 

43, 

do. 

si) 

id  2, 

,  d-x 

,  dQ,  d-^) 

5^ 

=  (dj. 

d2. 

di. 

do) 

=  D. 


Thus,  at  stafe  0,  the  switching  element  SW^i[0,  y]  =  SW ^[0,  (J2,j \Jq)]  in  the  transformed 
MIN^  is  the  corresponding  switching  element  5^210,  y*]  =  SW2[0,  {Ji,joJ2)]  MIN,,  i.e.. 
SVVi[0,0]  SVV'2[0,5],  SIV'i[0,1]  -4  SVV2[0,7],  SW^i[0,2]  SW2[0’13>  SWi[0,3]  ^  SW2[0,3], 

5VVj[0,4]  ->  SVi/2[0,4],  5W'i[0,5]  SW2[0,6],  SWi[0,61  -»  SW'2[0,0],  and  SW^[0,71 

51T2[0,2].  Similarly,  at  other  stages,  we  have  SVV'ill,  (y^JiJo)]  SVV2[l.  (y'lJzJo)], 
SIV i[2,  (y 2,y i,y o)]  OoJi'y2)3’  (y2,y iJo)]  •^^2[^’ C/2’yo’7i)3- 

relative  positions  are  shown  in  Fig.  3.6. 

3.4.  SUMMARY 

In  this  chapter,  the  transformation  rules  for  a  MIN  to  simulate  another  is  presented.  The 
relative  positions  of  each  switching  element  before  and  after  a  network  transformation  are  also 
described.  Both  distributing  and  global  routing  schemes  are  shown  to  be  the  same  as  the  ori¬ 
ginal  MIN.  By  using  the  results  presented  in  this  chapter,  the  parallel  algorithms  developed 
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for  a  MIN  can  be  directly  be  reused  on  another  MIN  such  that  programming  effort  can  be 
greatly  reduced. 
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CHAPTER  4 


PERMUTATION  CAPABILITY  OF 
MULTISTAGE  INTERCONNECTION  NETWORKS 


4.1.  INTRODUCTION 

Various  properties  of  the  shuffle-exchange  type  multistage  interconnection  networks 
[WuFeSl]  have  attracted  considerable  interest  over  the  past  decade.  Panicularly,  a  number  of 
authors  [Law75]  [Ste83]  [NaSaSl]  [Len78]  [Sto71]  have  shown  that  these  networks  can  per¬ 
form  a  wide  variety  of  useful  permutations  for  parallel  processing.  A  permutation  is  called 
admissible  on  a  network  iff  it  can  be  realized  by  one  pass  through  the  network  without 
conflict  at  any  switching  element(s).  One  of  the  most  important  tasks  in  designing  a  parallel 
supercomputer  is  the  selection  of  a  suitable  network  in  order  to  optimally  support  application 
needs.  Before  that,  we  need  to  be  able  to  understand  the  permutation  capability  of  each  net¬ 
work.  The  set  of  admissible  permutations  of  an  Omega  network  has  been  characterized  in 
[Law75]  and  [Par801,  and  later  expressed  more  formally  in  [Pea77].  In  their  studies,  the  char¬ 
acterization  of  the  admissible  permutations  is  expressed  by  Boolean  functions  or  bit  relations 
of  source  tags  and  destination  tags.  However,  from  the  viewpoint  of  applications,  their  results 
did  not  give  any  algorithm  with  low  time  complexity  to  determine  the  admissibility  of  a 
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permutation.  On  the  other  hand,  in  the  study  of  Lee  [Lee85],  the  set  of  admissible  permuta¬ 
tions  of  the  inverse  Omega  network  has  been  characterized  by  using  residue  classes  of  desti¬ 
nation  tags.  However,  her  analysis  is  rather  tedious  and  indirect  due  to  ignoring  the  charac¬ 
teristics  of  the  structure  of  inverse  Omega  networks.  Her  result  also  suffers  the  problem  of 
high  complexity  due  to  the  use  of  modulo  operations.  Other  than  these  studies,  the  characteri¬ 
zation  of  admissible  permutations  of  networks  has  seldom  been  mentioned. 

While  it  has  been  proved  that  there  exists  a  class  of  topologically  equivalent  networks 
with  the  same  hardware  complexity  [Agr83],  very  little  has  been  known  about  what  kind  of 
models  can  be  used  to  characterize  them.  In  this  chapter,  we  introduce  a  general  model.  The 
characteristic  of  the  permutation  capability  of  a  class  of  useful  networks  detined  by  this  gen¬ 
eral  model,  which  includes  the  six  famous  networks  in  [WuFe81]  as  special  cases,  is  studied. 
Our  analysis  is  based  on  the  natural  structure  of  a  network  which  can  be  specified  by  two  per¬ 
mutation  functions.  We  start  our  discussion  on  Omega  networks  due  to  their  regular  structure, 
and  then  generalize  the  problem  to  the  general  model  using  bit-permute-complement  connec¬ 
tions.  Our  analysis  is  more  direct,  simple  and  general  than  ail  the  previous  works  .  We  show 
that  the  set  of  admissible  permutations  of  a  network  can  be  characterized  by  very  simple  bit 
relations  depending  on  two  permutation  functions  which  specify  this  network.  Our  result 
shows  that  the  time  complexity  of  our  proposed  algorithm  to  determine  the  admissibility  of  a 
permutation  on  a  network  is  0(N),  where  N  is  the  number  of  inputs/outputs  of  the  network. 

The  remainder  of  this  chapter  is  organized  as  follows.  In  Section  4.2,  the  basic 
definitions  and  notations  are  introduced.  Panicular  attention  is  devoted  to  the  routing 
behavior  of  Omega  networks.  In  Section  4.3,  by  introducing  a  panitioning  scheme,  a 
sequence  of  substructures  (subnetworks)  are  produced.  These  substructures  are_  associated 
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with  some  specific  partitions  on  network  inputs  which  can  be  used  to  characterize  admissible 
permutations  of  an  Omega  network.  The  characteristic  of  admissible  permutations  of  Omega 
networks  is  given  in  Section  4.4.  In  Section  4.5,  a  general  model  of  a  class  of  networks  is 
defined.  We  show  that  our  analytic  methodology  can  be  easily  generalized  to  the  general 
model.  Finally,  conclusions  are  given  in  Section  4.6. 

4.2.  PRELIMINARY 

4.2. A.  Omega  Networks 

In  this  chapter,  without  loss  of  generality,  we  start  our  discussion  on  the  permutation 
capability  of  Omega  networks  [Law75]  built  with  2x2  switching  elements.  The  general  prob¬ 
lem  of  various  multistage  interconnection  networks  is  discussed  in  Section  4.5.  An  NxN 
Omega  network  consists  of  n  =  logiN  stages  of  2x2  switching  elements  for  connecting  .V 
network  inputs  and  .V  network  outputs.  (Note  that,  for  simplicity.  log2iV  is  also  denoted  as 
logN  in  this  chapter.)  Each  stage  consists  of  N/2  switching  elements  and  the  interconnection 
pattern  between  stages  is  the  perfect  shuffle  permutation.  An  Omega  network  for  .V  =  16  is 
shown  in  Fig.  4.1.  The  following  conventional  notations  are  used  throughout  this  chapter. 
The  stages  of  the  network  are  numbered  from  0  through  n-\  from  left  to  right.  The  the 
input/output  pons  (including  network  inputs/outputs)  of  switching  elements  at  each  stage  are 
numbered  from  0  through  N-l  and  the  switching  elements,  from  0  thiough  N 12  -  1  from  top 
to  bottom.  The  binary  representation  of  a  number  /  =  ....  /  j,  (q)  ^where  bit  is  the 

the  most  significant  bit  (MSB)  and  bit  /q,  the  least  significant  bit  (LSB))  is  used  to  represent 
the  audress  of  this  number.  A  set  of  numbers  with  a  similar  address  representation  can  be 
represented  by  a  common  address  label.  For  example,  (/„_i,  ...,  c . c ),  where  c  =  0  or  1 


75 


Fig.  4.1,  A  J5  X  16  Omega  network. 
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(i.e.,  don’t  care)  and  #(c)  =  i,  (i.e.,  the  total  number  of  c’s  is  equal  to  i)  represents  those  2‘ 
numbers  with  the  same  first  n  -  i  bits  in  their  addresses.  The  notation  is  used  to 

represent  a  segment  of  the  address  /  from  bit  to  bit  /^,  i.e.,  (Z^,  If,).  U  a  =  b ,  then 

[l]a  denotes  bit  in  the  binary  representation  of  /.  (Throughout  this  chapter,  it  is  assumed 
that  all  the  variables  are  integers.)  A  simple  routing  scheme  on  an  Omega  network  can  be 
described  as  follows.  Let  D  (i )  be  the  destination  tag  of  a  data  packet  from  network  input  i  = 
...,  I'l,  Zq),  0  <  i  <  2"  -  1.  That  is,  this  data  packet  from  network  input  i  will  be  routed 
to  the  network  output  D(i).  Then,  according  to  the  routing  scheme  of  Omega  networks 
[Law75],  bit  [D(/)]^_i_^  is  used  to  determine  the  connection  of  the  switching  element  at 
stage  j,  0  <  j  <  n  -  1,  on  the  path  connecting  input  i  to  output  D(i). 

li  DU)  ^D{k)  for  i  k  and  slU  0  ^  i ,  k  <  N  -  1,  then,  D  =  (D(0),  D(l) . D(N-\)) 

represents  a  permutation  of  (0,  1.  ...,N—l).  There  are  totally  N\  different  permutations  of 
(0,  1 . ;V-1)  and  we  denote  them  as  the  set  (Zy  )•  Let  Q  denote  the  set  of  all  the  admissi¬ 

ble  permutations  of  an  Omega  network.  Since  an  Omega  network  contains  (.^^log.^')/2  switch¬ 
ing  elements,  each  of  which  can  be  set  in  either  one  of  the  two  states  (i.e.,  either  straight  con¬ 
nection  or  crossing  connection),  different  settings  of  these  switching  elements  pass  different 

■~log.V  ^ 

permutations.  It  can  be  easily  proved  that  #(Q)  =  2  ^ 

It  is  convenient  to  describe  some  frequently  used  permutations.  One  of  them  is  the  fam¬ 
ily  of  Bit -P ermute -Complement  (BPC)  type  permutations,  denoted  by  {(i^  }. 

DEFINITION  1:  Let  /  =  ...,  l\,  /q)  be  a  number  in  {0,  1,  ...,  ;V-1).  A  permutation 

{3  s  (3^}  is  specified  by  an  ^-tuple  vector  {3  =  (?i^_;6(/t-l) . )l|0(1).  )q,0(O)).  where 

(0(«-l),  0(1),  0(0))  is  a  permutation  of  («-l,  ...,  1,  0)  and  A.,  s  (-1,  1},  0  <  /  <  ai  -  1, 
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such  that  li,  Iq)  =  mi,  mo),  where  m,-  =  /e^)  if  X,  =  1,  else  m,-  =  1  - 

^6(1 )  =  ^0(1 )  if  =  “i-  □ 

In  other  words,  p(/)  is  obtained  from  /  by  first  permuting  the  bits  of  the  binary  represen¬ 
tation  of  /  and  then  complementing  a  subset  of  bit^  according  to  the  vector  0.  Similarly,  the 
inverse  function  and  the  absolute  function  1  pi  of  P  are  defined  as  follows. 

DEFINITION  2:  Let  be  the  inverse  function  of  P  t“  {p„  }.  The  function  P"^  e  (P^  } 
is  specified  by  an  n -tuple  vector  P~'  =  (A.Q-i(^_i)9~*(rt-l),  >.0-i(ii0“^(l ),  >.0-i,o,9~’(O))  where 

9~^  is  the  inverse  function  of  0.  C 

DEFINITION  3:  Let  I  pl  be  the  absolute  function  cf  P  e  {P„).  The  function  I  pi  € 

( P„  }  is  specified  by  an  n  -tuple  vector  I  pl  =  (I  ^„_i  1  0(n-l),  ...,  I  Aj  i  0(1),  I  9(0)).  G 

Let  a  and  P  be  two  permutation  functions  in  { P^  )  and  be  specified  by  vectors  <5c  and  p. 
respectively.  The  composition  of  a  and  P  is  denoted  as  a-p  such  that  a-B(/)  =  a(P(/))  fi.e.. 
the  composed  functions  are  performed  from  right  to  left)  and  is  specified  by  a  vector  (i  p. 
For  example,  consider  n  =  4  and  let  p  be  specified  by  p  =  (-2,  -1,  0,  3)  and  /  =  (1,  0,  0,  1). 
We  have  m3  =  1  -  Z-.  mi  =  I  -  1 1,  m^  =  /q  and  mg  =  Hence,  p(/)  =  (1,  1,  I,  1).  Simi¬ 
larly,  It  IS  easy  to  obtain  that  p~^  =  (0,  -3,  -2,  1),  P"*(0  =  Uq,  1  -  ^3.  1  -  1 2^ 

0)  and  I  pl  =  (2,  1,  0,  3),  I  pi  (/)  =  (0,  0,  1,  I).  If  a  is  specified  by  (i  =  (-1,  2,  3,  -0),  then 
a  P(/)  =  a(P(/))  =  a(l,  1,  1,  1)  =  (0,  1,  1,  0)  and  the  composition  a-p  is  specified  by  the  vec¬ 
tor  d  p  =  (-1,  2,  3,  -0)  (-2,  -1,  0,  3)  =  (-0,  -1,  -2,  -3).  .-Mso  note  that  the  hit -reversal 

permutation  p  is  one  of  the  BPC  type  permutations  such  chat  p(/)  =  '/g,  /),  i^_2-  U-ih 

where  p  -  (0.  1 . n-1).  Another  example  is  the  perfect  shuffle  permutation  '  such  that 

il(7  )  =  (^^-2’  •••’  ^)’  U-ih  where  ^  =  (n-2,  ■  •  •  ,  0,  n-1). 


78 


4.2.B.  Routing  Behavior  of  Omega  Networks 

For  an  n  -stage  Omega  network,  due  to  its  regular  structure,  any  path  connecting  network 
input  5  =  (5n-i>  ••••  ■^1-  •^o)  to  network  output  Dis)  =  (d„_i,  ...,  dj,  do),  can  be  expressed  by 
the  following  transition  sequence: 

^  ^n-2’  ^1’  ^'o) 

S  =  {Sn-l'  '^n-3’  ■^0»  ■^n— l) 

D^(s)  =  (s„_2.  S^_2,  5o.  4n-i) 

S  —  5^_4,  S Q,  ^n—2^ 

D  (s)  —  •^n-4>  ■^0.  d„_2) 


s'  =  (5^-;-.-  •••■ 

1 

6 

D‘(5)  =  U„_2-,  . 

7 

6 

...  d^_, ,  d^_i_;  ) 

5"  '■  =  (d„_i,  d„_2 . d,.  5o) 

D''~Hs)  =  (d„_i,  d„_2,  ....  di,  do) 

s  —  (d^_j,  ....  d{.  do) 

-  D  (5 ). 

In  the  transition  sequence,  each  .v,  0  <  i  n  —  1,  represents  the  address  ot  the  input  port  ot 
a  switching  element  at  stage  i  through  which  a  path  staning  from  i  traverses  stage  .  and  each 
D‘(i  ).  0  <  i  <  n  -  1,  the  output  pon  of  the  same  switching  element  through  which  the  path 
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traverses  stage  i.  That  is,  a  data  transfer  path  of  this  switching  element  is  connected  from 
input  pon  5*  to  output  pon  D^(s).  Obviously,  ]«-!:!  =  [^‘ is  the  address  of  this 
switching  element  through  which  a  path  traverses  stage  i.  Similarly,  the  idea  of  the  transition 
sequence  can  be  used  at  each  stage  to  express  paths  which  connect  a  switching  element  to 
network  inputs  and  outputs.  Assume  that  a  switching  element  E  at  stage  /  has  the  address  £ 
=  (e„_i,  ...,  cj)  and  the  address  label  of  inpui/output  pons  £  is  e  =  (^n-i>  •••’  We  have 

the  following  two  transition  sequences  to  indicate  which  network  input  s’s  and  output  D(e)’s 
are  connected  through  £. 

Backward : 

^  ^n-2'  •••’  ^  I’  ^  ) 

=  (c.  e„_i,  ....  e..  ej) 

)  =  (c\  e„_i.  ...,  C;.  c  ) 
e‘"'  =  (c.  c ,  e„_i,  ....  Cl) 

D‘\e)  =  (c . c.  e„_i,  ....  c) 

=  (c . c,  e„_i . <^1^0 

=  s; 

Forward : 

e  =  e„_2.  ....  e,.  c) 

=  (c„_2.  . Cj.  c, 
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D'*^{e)  =  (^„_2,  e„_3,  ....  ei,  c,  c) 


. ^1.  c.  ....  c.  ^.^i) 

D''~\e)  =  (<?;,  ....  ei,c,  ....  c,  c) 

=  («, . c,  ....  c.  c) 

=  D{e). 

The  switching  element  E  at  stage  i  can  be  viewed  as  the  common  root  of  two  communi¬ 
cation  binary  trees.  One  is  the  backward  (/+l)-level  tree  with  the  address  label  of  its  leaves 

(i.e.,  switching  elements  at  stage  0)  equal  to  (c.  ....  c,  . #(c)  =  i,  and  the 

address  label  of  network  inputs  connected  to  leaves  equal  to  5  =  (c,  ....  c,  e„_j,  .... 

#(c)  =  i  +  1.  The  other  one  is  the  forward  (rt-t)-level  tree  with  the  address  label  of  its 
leaves  (i.e.,  switching  elements  at  stage  n-l)  equal  to  {e.,  ...,  ej,  c.  ....  c ),  #  (c )  =  n  -  /  -  1, 
and  the  address  label  of  network  outputs  connected  to  leaves  equal  to  D{e)  = 
(e, ,  ...,  ei,  c . c).  ^{c)  =  n  -  i'.  Thus,  totally  Z*"*"'  network  inputs  and  2'*“'  network  out¬ 

puts  are  connected  through  E .  Particularly,  when  i  =  0  (n  -  1),  these  above  two  binary  trees 
are  reduced  to  one  in  which  the  root  E  is  rooted  at  stage  0  (n-1)  and  the  root  E  is  connected 
to  two  network  inputs  s  =  (c ,  e„_i,  ....  e  i)  (two  network  outputs.  Die)  =  ....  <?i,  c)) 

and  all  the  network  outputs  (inputs). 

4.3.  PERMUTABLE  STRUCTURE 

There  are  many  different  ways  to  panition  an  Omega  network  into  disjoint  subnetworks 
by  forcing  all  the  switching  elements  of  one  or  more  stages  to  straight  connection  (0-state)  or 
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crossing  connection  (1 -state).  Any  switching  element  forced  to  a  fixed  state  can  be  removed 
and  replaced  by  two  direct  connecting  links  between  its  input  and  output  ports.  In  this  sec¬ 
tion,  by  introducing  a  proper  partitioning  scheme,  a  sequence  of  substructures  of  an  Omega 
network,  referred  to  as  permutable  substructures,  are  produced.  Each  substructure  is  a  sub¬ 
network  which  can  be  used  to  characterize  admissible  permutations  of  an  Omega  network. 
Our  work  is  based  on  the  following  fact. 

THEOREM  1:  By  forcing  all  the  switching  elements  at  stage  i  to  0-state  or  1-state  of  an 
/2 -stage  Omega  network,  two  disjoint  (^n-l)-stage  subnetworks  are  formed  such  that  the 
{n-i-[)th  bits  of  the  input  or  output  addresses  in  each  subnetwork  are  the  same. 

Proof:  Proof  can  be  given  by  referring  to  the  transition  sequence  described  in  Section 
4.2.  By  observing  the  numbers  s‘  and  D‘(s),  the  following  fact  can  be  obtained.  Forcing  all 
the  switching  elements  at  stage  i  to  0-siate  (l-statc)  is  equivalent  to  forcing  the  LSB,  , 
of  S'*  to  be  replaced  by  (1  -  i'n-t-i)  in  the  LSB  position  of  D‘(5). 

Let  us  consider  the  case  where  switching  elements  at  stage  i  are  forced  to  0-state.  In 
each  switching  element  at  stage  i,  a  data  packet  is  forced  to  be  routed  from  the  input 

port  to  the  ouqjut  port.  That  is  to  say,  any  input  s  can  only  communicate 

with  an  output  Dis)  v  iih  the  bit  -  Obviously,  two  subnetworks  are  formed  by 

partitioning  the  N  network  inputs  and  outputs  into  two  groups  such  that  in  each  subnetwork 
the  addresses  of  the  N /2  network  inputs  agree  in  their  fn-1— i)th  bits  (i.e.,  ^,,_i_j's),  the 
addresses  of  the  .V/2  network  outputs  agree  ;n  their  in-i-i  dh  bits  d.e.,  ’s)  and 

=  ■  To  prove  that  these  two  suDnctworks  ,u-e  disjoint,  it  is  sufficient  to  prove  hat  there 

are  no  common  switching  elements  on  these  two  subnetworks.  Assume  that  Fg  represents  the 
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subnetwork  with  =  0  in  addresses  of  its  network  inputs/outputs  and  Fi,  the 

subnetwork  with  =  1  in  addresses  of  its  network  inputs/outputs.  Recall  that 

either  i  or  is  the  address  of  the  the  switching  element  through  which  the 

path  connecting  5  lo  D(s)  traverses  stage  j ,  0  <  j  <  n  -  By  observing  any  transition 
sequences  on  Fq  and  F|,  it  is  easy  to  see  that  at  any  stage  j  ^  i,  any  switching  element 
1  io*'  ^0  is  different  from  any  switching  element  of  Fj  in  at  least  one 

bit  position  where  either  bit  or  appears.  This  means  no  common  switching  ele¬ 

ments  e.xist  on  Fq  and  F^.  Thus,  Fq  and  F^  are  disjoint. 

Similarly,  for  the  1 -state  case,  two  disjoint  subnetworks  can  be  formed  by  panitioning 
the  N  network  inputs  and  outputs  into  two  groups  such  that  in  each  subnetwork  the  addresses 
of  the  N /2  network  inputs  agree  in  their  (n-l-i)th  bits  (i.e.,  the  addresses  of  the 

N/2  network  outputs  agree  in  their  f-^-l-ijth  bits  (i.e.,  ’s)  and  =  1  -  □ 

For  example,  by  forcing  all  the  switching  elements  at  stage  1  of  the  Omega  network 
shown  in  Fig.  4. 1  to  0-state,  two  disjoint  3-stage  subnetworks  are  formed.  In  one  of  them,  the 
addresses  of  the  eight  network  inputs  agree  in  bit  (0,  1,  2,  3,  8.  9,  10. 

11})  and  the  eight  network  outputs,  in  bit  =  0  (i.e.,  they  are  {0,  1,  2,  3,  8,  9,  10.  11}).  In 
the  other  one,  the  addresses  of  the  eight  network  inputs  agree  in  bit  S2  =  1  the  addresses 
of  the  eight  network  outputs  agree  in  bit  ^2  =  1.  These  two  subnetworks  are  shown  in  Fig. 
4.2. 

Now,  we  employ  a  panitioning  scheme  on  Omega  networks  which  gives  a  better  analytic 
way  than  that  in  [Lee85}  in  order  to  have  a  global  view  on  the  permutation  behavior  of 
Omega  networks.  The  panitioning  scheme  which  can  produce  a  sequence  of  substructures  on 


83 


Fig.  4.2.  Two  3-stage  subnetworks  are  formed  by  forcing  all  the  switching  elements  at  stage  I 

on  a  16  X  16  Omega  network. 
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an  Omega  network  is  described  as  follows.  According  to  Theorem  1,  if  we  remove  stage  n-l 
of  an  Omega  network,  then  two  (n-l)-stage  disjoint  subnetworks  will  be  produced.  Here,  by 
removing  the  last  stage  from  a  (sub)network  we  mean  that  both  the  last  stage  and  the  connec¬ 
tion  pattern  before  the  stage  are  removed  and  thus  the  remaining  output  ports  are  left  as  net¬ 
work  outputs  of  the  two  disjoint  subnetworks.  If  we  remove  stage  n—1  of  any  one  of  these 
(n-l)-stage  subnetworks,  then  another  two  (/i— 2)-stage  disjoint  subnetworks  will  be  produced. 
This  process  can  be  continued  by  removing  stage  i-1  of  any  /  -stage  subnetwork  to  produce 
another  two  (/-l)-stage  disjoint  subnetworks,  for  all  2  <  /  <  n  —  1.  When  /  =  2,  after 
removing  stage  1,  subnetworks  with  single  switching  element  will  be  produced.  The  above 
argument  implies  a  recursively  partitionable  structure  of  Omega  networks.  We  specify  it  by 
the  following  definition  and  theorems. 

DEFINITION  4:  Let  <I>[n  -  1,0]  be  an  Omega  network  and  0[n  -  2,  f],  0  <  r  <  1,  be 
an  tn-D-stage  subnetwork  produced  by  removing  stage  n-1  of  0[n  -  1,  0].  The  (/-^l)-stage 
subnetwork  0[/,  r],  0  <  /'  <  n  -  2  and  0  ^  r  S  2"“'"^  -  1,  is  obtained  by  removing  stage  /  +1 
of  an  (/ -i-2')-stage  subnetwork  <!)[/  1,  r  ],  0  <  r  <  2'*“''^  -  1.  Let  m/>i(0[/,  r])  be  the  smal¬ 

lest  address  of  switching  elements  at  stage  /  of  subnetwork  0[/,  r].  We  assume  that  for  any 
two  subnetworks  d>[/,  r*]  and  0[/,  r**],  r*  >  r**  iff  m/n(<I>[/,  r*])  >  min{<t>[i,  f**]).  □ 

The  following  theorem  shows  which  network  inputs  are  connected  to  a  subnetwork  0[/, 

r]. 

THEORE.M  2:  Let  T'f/,  i)  be  the  set  of  network  inputs  connected  to  0(/,  ;],  where  0  < 
i  <  n  -  1.  0  <  t  <  -  1  and  t  =  •••,  ^o)  l^e  binary  representation  of  t. 

Then,  4^(/ .  r)  =  {(c,  ...,  c,  •••.  ^i.  ^o)  '  #(c)  =  /  +  1 )  and  {(tn-,-2’  •••'  ^i,  c . c)  I 
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#(c)  =  i }  is  the  set  of  switching  elements  at  stage  i  of  0[j,  r].  For  i  =  n  —  I,  0[n  -  1,  0]  is 
an  Omega  network  and  ^(n  -  1,  r)  =  (0.  F 

PROOF:  According  to  the  partitioning  scheme,  the  set  of  subnetworks  {<!>[/,  :]  I  0  <  r  < 
-  1}  is  produced  by  removing  all  the  stages  from  stage  n-l  to  stage  i+l.  And. 
according  to  Theorem  1,  the  last  n  -  /  -  1  bits  (starting  from  LSB  to  the  («-(-2)th  bit)  of 
all  the  network  inputs  in  set  t)  corresponding  to  the  subnetwork  0[i,  r]  are  the  same. 
Thus,  'F(i,  t)  =  {(c,  c,  rj.  to)  ■  =  i  +  1}.  To  prove  (C,_2,  t[,  t'^)  = 

•••,  ^1.  ^o).  let  tis  check  the  following  fact.  Let  ,  r))  be  the  smallest  address  in 

set  T'ii,  t).  Due  to  the  inverse  perfect  shuffle  connections,  when  one  backtracks  from  stage 
«-l  of  an  Omega  network,  it  is  very  easy  to  observe  that  •••,  ^  l '  n(c  )  = 

i }  is  the  set  of  switching  elements  at  stage  /  of  d)[i,  r].  By  backtracking  i  +  1  inverse  per¬ 
fect  shuffle  connections,  we  can  also  see  that  the  set  of  switching  elements  {(r„_(_2,  fi.  to, 


L- .  c)  I  #(c )  =  i]  is  connected  to  the  set  of  inputs  {(c .  c,  tn_,_2 .  ti- 


Tq)  I  #  (c)  =  (  +  1 }.  Thus,  for  any  t*  > 
to.  0 . 0)  >  min(<l>[i.  t**])  =  (t^*,_2. 


t**.  we  always  have  ;*])  =  (f^_,_2.  t*- 

...,  r|*'  t”,  0 . 0).  That  is,  (t^*_,_2 .  t*'  r,*)  > 


. t**'  L*)* )  which  in  turn  implies  f*))  >  minima,  : 


For  example,  for  N  =  16,  the  three  sets  of  subnetworks  {0[/,  t]  I  0  <  t  <  2-"‘  -  1 },  0  < 
i  <  2,  of  an  Omega  network  are  shown  in  Fig.  4.3(a)(b)(c).  In  Fig.  4.3(a),  the  set  of  two  sub¬ 
networks  {0[2,  r]  I  0  <  r  <  1)  is  obtained  by  removing  stage  3  from  the  Omega  network. 
The  set  of  network  inputs  4^(2.  0)  corresponding  to  the  subnetwork  0(2,  0]  is  ((c,  c.  c.  0))  = 
(0,  2.  4.  6,  8,  10,  12,  14}  and  the  .set  of  network  inputs  T^(2.  I)  associated  with  the  subnet¬ 
work  d>[2,  1]  is  {(c.  c,  c,  1)1  =  {1,  3,  5,  7,  9,  11,  13,  15).  Similaily,  in  Fig.  J.3(b)  and 
4.3(c),  the  sets  of  network  inputs  ^'(l.  t)  and  ^*(0,  i)  corresponding  to  the  subnetworks  0[1. 


C/i  4^  UJN> 
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r]  and  O[0,  r]  is  {(c,  c,  rj  fo))  {(c,  c,  c,  fo)},  respectively. 

Theorem  2  outlines  the  structure  of  each  subnetwork  in  terms  of  the  set  of  network 

inputs  connected  to  it  and  the  set  of  switching  elements  at  the  last  stage  of  it.  Obviously, 

because  of  the  recursively  partitionable  structure  of  an  Omega  network,  each  subnetwork  can 
also  be  recursively  partitioned  like  an  Omega  network.  This  is  based  on  the  fact  that  the 
structure  of  any  subnetwork  is  identical  to  that  of  an  Omega  network  with  reduced  size.  The 
following  two  theorems  describe  the  substructure  of  a  subnetwork  after  being  recursively  par¬ 
titioned.  Since  their  proof  are  similar  to  that  of  Theorem  2,  we  omit  them  here. 

THEOREM  3:  Two  /-stage  subnetworks  {<!>[/  -  1,  +  r]  I  0  <  5  <  1}  are 

obtained  by  removing  stage  i  of  the  (i-f-l)-stage  subnetwork  <!)[/,  r],  I  <  /  <  n  -  1  and  0  <  r 
<  -  1.  Each  <I)[i  -  1,  2"~‘“^-5  +  r]  is  connected  to  the  set  of  network  inputs  -  1, 

2'«-(-i,^  +  r)  =  {(c,  ...,  c,  s,  tn-i-2,  ti,  to)  *  =  i }.  □ 

For  example,  if  we  remove  stage  2  of  the  3-stage  subnetwork  <1>[2,  1]  (i.e.,  i  =  2  and  t  = 
1),  then  we  will  obtain  two  2-stage  subnetworks  {0[1,  I  s  +  1]  I  0  <  5  ^1).  Each  0[1,  2-s 

+  1]  is  connected  to  the  set  of  network  inputs  'FCl,  2s  +  1)  =  {(c,  ...,  c,  5,  1)  i  #(c)  =  2}. 

That  is,  'F(l,  1)  ={  1,  5,  9,  13}  and  T(l,  3)  =  {3,  7,  11,  15). 

THEOREM  4:  The  set  of  (/+l)'Stage  subnetworks  {0(7 ,  2'*~‘~^  m  -i-  f]  I  0  <  m  <  2'“-'  - 
Ij,  0  <  j  <  i,  is  obtained  by  removing  stages  from  stage  i  to  stage  ;+l  of  the  (i+l)-stage 
subnetwork  0[j,  r],  1  <  /  <  n  -  1  and  0  <  r  <  2'’“'"*  -  1.  Let  ...,  /wq)  be  the 

binary  representation  of  m.  Each  Ofy,  2'*"'“^  m  +  r]  is  connected  to  the  set  of  network 
inputs  ^(j,  2''"‘"‘  m  +  r)  =  {(c,  ...,  c,  ...,  m^,  rriQ,  fi,  fo)  i  #(c)  =  j  ^  rC 
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0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 


(W(l,r)  I  ={{0.4,8,12}.  {1,5,9.13),  {2,6,10,14},  {3,7,11,15)) 


TO,r)  i  O^rslj  ={{0,2,4,6,8,10,12,14}.  (1,3,5,7.9.11,13,15)} 


Fig.  4.4  ForA^=  16,  the  three  different  paititions  I  0  S  t  S  2^-lJ,  0  s  t  S  2. 
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For  example,  for  i  =  2,  j  =  0  and  r  =  1,  if  we  remove  stages  from  stage  2  to  stage  1  of 
the  3-stage  subnetwork  0[2,  1],  then  we  will  obtain  the  set  of  1-stage  subnetworks  {<I)[0,  I  m 
1]  I  0  <  m  <  3}.  Each  O[0,  2  m  +  1]  is  connected  to  the  set  of  network  inputs  {(c,  c, 
mi,  mo,  1)  I  #(c)  =  1}.  That  is,  m  D  =  {1.  9}.  m  3)  =  {3,  11},  m  5)  =  {5,  13),  and 
m  7)  =  {7,  15). 

Note  that,  for  each  0  <  i  <  n  -  1,  the  set  of  subnetworks  {<l>[i,  r]  i  0  <  t  <  -  1} 

corresponds  to  the  set  of  inputs  {4^(i,  /)  1  0  <  r  <  -  1}  which  is  a  partition  on  all  the 

network  inputs  {0,  1.  ...,  N-\}.  For  each  i,  the  panition  {'F(i,  r))  contains  2"“““’  groups  of 
network  inputs.  Each  group  of  network  inputs  has  2‘'*''  elements.  In  the  following  sections, 
we  will  show  that  these  partitions  play  a  major  role  when  we  characterize  Q.  For  example, 
the  three  corresponding  partitions  {'?(/,  r)  I  0  ^  f  ^  2^'  -  1)  associated  with  the  three  sets 
of  subnetworks  {0[/,  r]  I  0  <  r  <  2^~‘  -  1},  0  <  /  <  2,  are  shown  in  Fig.  4.4.  When  i  =  3, 
{0[/,  0]}  and  {^(i,  0)}  become  trivial  cases,  i.e.,  an  Omega  network  and  ail  the  network 
inputs  {0,  1,  ...,  N-l]  ,  respectively.  For  i  =  2,  1,  and  0,  we  have 

{^(2,  0  i  0  <  r  <  1}  =  {{0,  2,  4,  6,  8,  10,  12,  14},  {1,  3,  5,  7,  9,  11,  13,  15}}, 

m,  t)\0<t  <3]  =  {(0,  4,  8,  12},  {1,5,  9,  13},  (2,  6,  10,  14},  {3,  7,  11,  15}}, 

(mr)l  0<r  <7}  =  ({0,8},  {1,9},  {2,  10),  {3,  11},  {4,  12},  {5,  13},  {6,  14},  {7,  15}}. 

4.4.  PERMUTATION  CAPABILITY 

The  set  of  admissible  permutations  Q  has  been  characterized  by  a  number  of  authors 
fLaw751,[Par80],[Pea77].  The  following  theorem  summarizes  their  work  in  network  admissi¬ 
bility. 
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Let  7t  s  D  e  }  be  a  permutation  of  (0,  1,  ....  N-l)  which  is  associated  with  routing 
tags  D  =  (D{0),  D{\),  D{N-l)).  Each  routing  tag  D{i)  is  used  by  the  network  input  i  = 

{i„-i .  /‘i,  J'o)  of  ^  n -stage  Omega  network  to  route  data  packets  to  the  network  output 

D  (i).  According  to  the  routing  scheme  of  an  Omega  network,  bit  [p(D  (i))]j  is  used  to  deter¬ 
mine  the  connection  of  the  switching  element  at  stage  j,  0  <  j  <  n  -  1.  (Note  that,  as 
defined  earlieri  p  is  a  bit-reversal  permutation.) 

THEOREM  5:  A  permutation  re  e  O  iff  for  each  i  and  j,0<i,j<N  -  I  and  i  ^  J , 
either  one  of  the  following  two  conditions  is  true: 

(1)  ([i5(0]n-i;6,  ^  L/lft-hoX  for  all  1  <  <  /I  -  1  (see  [Law75]). 

(2)  There  exists  a  Boolean  function  /i,([D(»)]„_i.i,+i,  [i]i,_i:o)  such  that  [Dii)]f,  =  [i]/,  @ 

fb([D  [‘  ]a-i;o)*  for  all  0  <  b  <  n-2,  where  @  is  the  exclusive-or  operation 

(see  [Par80KPea77]).  □ 

Theorem  5  characterizes  Q  by  using  the  bit  relation  of  source  tags  and  destination  tags. 
However,  according  to  condition  (1)  of  Theorem  5,  in  order  to  know  whether  or  not  an  arbi¬ 
trary  permutation  belongs  to  Q,  computation  must  be  performed  for  all  the  possible  combina¬ 
tions  of  i ,  j ,  and  b .  An  algorithm  to  determine  the  admissibility  of  a  permutation  using  con¬ 
dition  (1)  of  Theorem  5  is  described  as  follows: 

function  ADMISSIBILITY -1  (5  €  {0,  ...,  N  -  1),  D  :  permutation) 
for  b  =  n  -  1  downto  1  do 

for  each  2  =  {.i'  I  the  sources  share  the  same  lower  b  address  bits}  do 
\I  DIF  FERENC  E([[D  is  )]n_xb  ^  s  e  Q])=  false  then  return  false 


return  true 
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The  function  DIFFERENCE  is  an  algorithm  to  determine  whether  or  not  a  finite  number  of 
integers  are  different.  It  can  be  shown  that  for  problem  size  N  the  fastest  algorithm  employed 
for  DIFFERENCE  takes  a  time  in  0(N\ogN)  (i.e.,  this  is  the  same  lowest  bound  as  that  of 
sorting  problem).  By  using  function  DIFFERENCE ,  the  algorithm  ADMISSIBILITY -\  com¬ 
pares  the  difference  of  the  higher  address  bits  of  those  destinations  (in  set  {[Z5  I  s  e 

Q})  whose  corresponding  sources  (in  set  Q)  share  the  same  lower  address  bits.  Since  for 
each  1  <  ^  <  n  -  1,  there  are  2*  different  Q  sets,  the  algorithm  ADMISSIBILITY -I  uses 
function  DIFFERENCE  2"“^  times. 

Condition  (2)  of  Theorem  5  is  a  reformulation  of  Condition  (1)  in  order  to  show  that  a 
special  set  of  permutations  is  admissible  on  the  indirect  binary  cube  network.  For  this  special 
set  of  permutations,  the  Boolean  function  is  easy  to  identify.  However,  in  general,  it  is 
extremely  difficult  (if  not  impossible)  to  determine  whether  or  not  there  exists  such  an 
which  satisfies  condition  (2)  for  an  arbitrary  permutation.  And  the  time  complexity  for  using 
condition  (2)  to  determine  the  admissibility  will  be  much  higher  than  that  using  condition  (1). 
In  this  chapter,  we  will  show  that  the  set  Q  can  be  characterized  in  a  much  easier  way  than 
that  given  by  Theorem  5  such  that  a  simple  and  low  complexity  algorithm  can  be  developed 
to  distinguish  permutations  in  Q  from  the  others.  This  is  our  main  work  in  this  section. 

A  key  idea  used  throughout  this  section  is  the  residue  system  in  number  theorem 
[Lee85]. 

DEFINITION  5:  A  complete  residue  system  modulo  m  (CRS(m))  is  a  set  of  m  integers 
which  contains  exactly  one  element  of  each  residue  class  mod  m.  C 
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In  other  words,  if  every  element  of  a  CRS(m)  is  divided  by  m,  each  of  the  possible 
remainder  value  from  0  through  m  -  1  can  be  obtained.  We  have  the  following  natural 
observation  on  the  number  system  composed  of  non-negative  integers.  Any  consecutive  2^ 
numbers  form  a  CRS(2*)  which  yields  each  remainder  from  0  through  2*  -  1  when  divided 
by  2*.  If  the  same  numbers  are  divided  by  2*“^  instead,  there  will  be  a  pair  for  each 
remainder  from  0  through  -  1.  Thus,  a  CRS(2*)  contains  two  representatives  of  each 
residue  class  mod  2*“^  i.e.,  a  CRS(2^)  can  be  partitioned  into  two  CRS(2^~^)’s.  Since  there 
are  two  ways  to  choose  each  representative  of  a  residue  class  mod  2^~^,  as  many  as  4*“^ 
different  ways  of  panitioning  can  be  made  on  a  CRS(2'‘).  For  example, 

CRS(8);{7,  6,  0,  1,  3,  4,  5,  2} 

s  CRS(4):{7,  1,  4,  2}  u  CRS(4):{6,  0,  3,  5} 

s  CRS(4');{3,  1,  0,  2}  [j  CRS(4):{7,  4,  6,  5) 


As  pointed  out  in  Section  4.3,  for  each  0  <  /  <  n  -  1,  the  set  ('Fd,  r)  I  0  <  r  <  - 

1}  is  a  panmon  on  all  the  network  inputs  {0,  I,  ...,  N-lj  and  therefore  it  corresponds  to  a 
partition  on  {D  (y )  I  0  <  j  <  N  —  1 }.  We  will  show  that  the  CRS  property  of  all  these  n  par¬ 
titions  on  [D(J)  ^  0  <  j  <  N-\}  ensures  that  there  will  be  no  conflict  in  any  switching 
element(s)  when  the  permutation  7t  s  D  is  realized  on  an  Omega  network. 

THEOREM  6:  .A  permutation  tt  is  admissible  on  the  subnetwork  d)[i,  t],  (i.e..  it  can  be 
realized  without  conflicts)  where  0  <  i  <  n  -  2  and  0  <  t  <  -  1.  iff  [piDiJ))  I  j  s 


m  +t))  IS  a  CRS(2*^h,  for  all  0  < /t  <i  and  0  <  m  <  2'"^  -  1. 
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PROOF:  According  to  the  routing  scheme  of  an  Omega  network,  bit  [p(D(i))]^  is  used 
to  determine  the  connection  of  the  switching  element  at  stage  j ,  0  <  j  <  n  -  1.  For  any  k 
and  m,  we  can  imagine  the  subnetwork  m  +  r]  of  <!)[/,  r]  as  an  independent 

(^+l)-stage  network  in  which  the  routing  tag  used  on  input  j  e  +  t)  is 

or  [p(D(/))]t:0-  If  '  J  ^  +  r)}  or 

{[p(f^C/))]i:0  I  J  ^  m  +  r)}  is  not  a  CRS(2*'^'),  then  there  exist  at  least  two  net¬ 

work  inputs  of  the  subnetwork  2'‘~'“^  m  +  r]  such  that  data  packets  from  which  are  sent 
to  the  same  output.  That  is,  for  at  least  two  x,  y  e  ^(k,  +  t)  and  .r  ^  y, 

[p(D  (j:  ))]^.o  =  [p(D  (y  ))]^.o.  Thus,  it  results  in  conflict  in  at  least  one  switching  element  on 
<[)[^,  m  r].  This  can  be  easily  shown  by  an  inductive  method  staning  from  ;!:  =  0  in 

which  case  0[/:,  2''"‘"^  m  -i-  r]  is  a  single  switching  element.  Therefore,  for  all  0  <  <  i 

and  0  ^  m  <  2‘“*  -  1,  {p(D(/))  I  j  e  ^(k,  2""‘"^-m  -i-  r)}  must  be  CRS(2^'^*)'s,  iff  7t  can 
pass  the  subnetwork  0[i ,  r  ]  without  conflicts  on  switching  elements.  □ 

THEOREM  7:  A  permutation  7t  €  Q  iff  {p(DO'))  I  j  e  0}  is  a  CRS(2‘^^),  for  all 
0  <  /  <  /z  -  1  and  0  <  r  <  2'’“'“'  -  1. 

Proof:  This  proof  is  simply  an  extension  from  that  of  Theorem  6.  When  i  =  n  -  I, 
{p(D(J))  I  j  6  'F(n  -  1,  0)}  =  {0,1,  .  .  .  ,A-1}  is  a  CRS(A)  which  is  a  trivial  case  and  is 
always  true.  Thus,  the  permutation  tc  e  Q  iff  7t  is  admissible  on  both  subnetworks  0[n  -  2, 
and  0[n  -  2,  1].  □ 

.According  to  Theorem  7.  a  method  is  given  to  determine  whether  or  not  a  given  permu- 

n—l 

tation  is  an  admissible  one  of  an  Omega  network.  The  work  is  composed  of  totally  y 

1=0 


=  N 12  +  N /4  +■■■+  2  =  N  -  2  subtasks  and  each  subtask  needs  to  determine  the  CRS 
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property  of  a  set  of  integer  numbers  (i.e.,  the  set  [piD(J))  I  j  e  'P(/,  t)}).  Two  example 
permutations  are  shown  in  Fig.  4.5  for  a  16x16  Omega  network.  The  first  one  is  an  admissi¬ 
ble  permutation  (6,  4,  14,  8,  11,  15,  5,  12,  13,  10,  3,  7,  0,  1,  9,  2).  The  second  one  is  a  per¬ 
fect  shuffle  permutation  (0,  2,  4,  6,  8,  10,  12,  14,  1,  3,  5,  7,  9,  11,  13,  15)  which  is  not  admis¬ 
sible.  For  the  first  permutation,  it  has  been  shown  that  each  {p(D(7))  i  j  e  r)},  where  0 
S  /  <  2  and  0  <  /  <  2^~‘  -  1,  is  a  CRS(2''^^).  To  see  why  the  second  permutation  is  not 
admissible,  let  us  check  the  subnetwork  <I)[1,  0].  The  routing  tags  used  for  this  subnetwork 
are  {[p((D  0))],,o 0)1  =  {[p((^)  (0))li:o,  [p((^  (4))],;o,  [p((D  (8))]i..o.  [p((D 
=  {[0]i;o.  [lJi:0>  [9]i.o}  =  {00,  01,  00,  01)  CRS(4).  This  means  that  data  packets 

from  network  inputs  0  and  8  are  sent  to  the  same  network  output  0  of  0[1.  0].  Similarly, 
data  packets  from  network  inputs  4  and  12  are  sent  to  the  same  network  output  1  of  <I>[1,  0]. 
They  cause  conflicts  on  switching  elements  of  both  stage  0  and  stage  1  of  d>[l,  0]. 

From  Theorem  7,  the  set  is  characterized  by  using  the  residue  classes  of  destination 
tags  rather  than  the  bit  relations  of  source  tags  and  destination  tags.  However,  as  the  result  is 
compared  to  that  of  Theorem  5,  we  do  not  gain  much  in  saving  computational  efforts  since 
the  modulo  operations  and  the  work  to  determine  the  CRS  properties  will  consume  a  lot  of 
time.  The  same  problem  was  suffered  in  the  work  of  Lee  [Lee85].  Nevertheless,  the  result  of 
Theorem  7  is  still  useful.  We  show  next  that  a  more  effective  way  to  characterize  Q  than  that 
of  Theorem  5  can  be  derived  from  Theorem  7.  Before  we  discuss  that,  let  us  see  what  the 
characteristics  of  Omega  and  inverse  Omega  networks  are. 

In  the  first  transition  sequence  in  Section  4.2.  source  bit  is  moved  to  the  LSB 

position  of  s‘  at  stage  i  and  is  replaced  by  the  destination  bit  of  D(s)  in  the  LSB 

position  of  D‘(s).  That  is,  by  the  physical  meaning,  a  data  transfer  path  of  the  switching 


97 


s  : 
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example  permutabons  on  a  i6  x  1 6  Omega  network. 


Fig.  4.5.  Two 
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element  (or  )]n-i;i)  is  connected  from  input  pon  to  output  port 

Therefore,  the  order  for  the  source  bits  to  be  removed  is  n-l,  n-2,  n-3,  2,  1,  0  and  the 

order  for  the  destination  bits  to  be  introduced  is  also  n-1,  n-2,  n-3,  2,  1,  0.  We  can  use 

two  vectors  O  =  (0,  1,  2  ,  n—2,  n  —  l)  and  /  =  (0,  1,  2  ,  n—2,  n—1)  which  correspond 

to  two  permutation  functions  O  and  /~^  in  {3„},  to  represent  these  two  sequences,  respec¬ 
tively.  They  are  referred  to  as  characteristic  function  of  an  Omega  network.  Thus,  by 
denoting 

0  =  (  d(/i-l),  d(n-2)  .....  <5(2),  <5(1),  <5(0)  ) 
and  =  (  r^(/i-l),  i~\n-2) . r’(2),  i~\0)  ) 

we  mean  that  at  stage  j  source  bit  s.^^  will  be  removed  and  replaced  by  destination  bit 

Moreover,  the  meaning  of  the  permutation  function  I  is  as  follows:  if  bit  is 
replaced  by  bit  dj  at  stage  j ,  then  the  order  of  bits  of  l(D{s))  represents  the  disturbed  order 
of  bits  of  D  is ).  For  an  Omega  network,  we  have  O  =  p  and  7  =  7“^  =  p.  From  Theorems  1 
and  2,  it  is  obvious  that  the  function  O  =  p  can  uniquely  determine  those  n  partitions 
t )  \  0  <  t  <  2'*"'“^  -  1},  0  <  i  <  n  -  1.  From  Theorem  7,  this  in  turn  means  that  all  the 
admissible  permutations  Q  can  be  uniquely  characterized  by  functions  O  and  7 . 

Let  the  characteristic  functions  of  an  inverse  Omega  network  be  denoted  by  0/^  and  7/j . 
Note  that  the  (n-i)th  stage  of  an  Omega  network  becomes  the  ith  stage  of  its  inverse  net¬ 
work.  It  is  clear  that  any  permutation  7t  €  O  iff  is  an  admissible  permutation  of  the 
inverse  Omega  network.  We  may  denote  the  set  of  ail  the  admissible  permutations  of  an 
inverse  Omega  network  by  Thus,  for  any  transition  sequence  of  an  Omega  network,  we 
have  a  new  explanation  for  its  inverse  Omega  network:  source,  bit  of  D[s)  is  moved  to 
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the  LSB  position  of  D^(s)  at  stage  n-i  and  is  replaced  by  the  destination  bit  of  s  in 

the  LSB  position  of  j'.  That  is,  at  stage  j  source  bit  will  be  removed  and  replaced  by 

destination  bit  Therefore,  we  have  (9/j  and  for  an  inverse  Omega  network  as 

follows; 

Ok  =  p7~\  Ok  =  (n-l,  n-2,  ...,  1,  0) 

Ik^  =  p<9,  4"^  =  (/i-l,  n-2 .  1,  0). 

Similarly,  the  function  Ok  =  p-p  which  is  an  identity  permutation  uniquely  determines  those 
n  partitions  t)  !  0  <  t  <  2"“'"^  -  I},  0  <  i  <  n  —  1.  Thus,  is  uniquely  character¬ 

ized  by  functions  Ok  and  Ik-  Note  that  it  is  easy  to  prove  that  no  two  subnetworks  0[a,  b] 
and  d]  of  an  Omega  network  and  its  inverse  Omega  network  respectively  have  the 

same  set  of  switching  elements  at  any  stage. 

The  following  theorem  derived  from  Theorem  7  gives  a  simple  closed  form  of  Q  in 
terms  of  bit  relations  of  destination  tags  with  respect  to  permutable  substmctures  of  an  Omega 
network  and  its  inverse  network. 

THEOREM  8:  A  permutation  7t  6  Q  iff  £  =  Z  ,  for 

;e  4'q.O  ;e 

all  0  <  J  <  n  -  1  and  0  ^  r  <  -  1.  That  is,  for  both  of  all  j  e  ^(z,  t)  and  j  €  ^/?(z, 

t ),  the  sum  of  i  th  bits  of  p(D  (y  ))’s  and  0~'(y  )’s  are  both  equal  to  2‘ . 

PROOF:  The  proof  is  based  on  the  following  fact.  Let  I  0  <  y  <  -  1}  be  a 

CRS('2‘"^).  Then.  ^  [^^J,  =  2‘  is  always  true,  i.e.,  the  sum  of  zth  bits  of  Rj,  for  all  0  <  y 

<  2‘*'  -  1,  is  equal  to  2‘.  Thus,  it  immediately  implies  that  21  [plf^0))]i  =  if 

je  n't!  .t ) 
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{piD  0'))  1  j  e  ,  t)]is  a  CRS(2'+'),  0  <  i  <  n  -  1  and  0  <  f  <  2"-'-^  -  1. 

(only  if)  From  Theorem  7,  if  the  permutation  Jt  e  Q,  then  {p(D(/))  I  j  e  4^(i,  ;))  is  a 

CRS(2‘'^'),  for  any  i  and  r.  Thus,  from  the  above  fact,  ^  [p(^  (/))],•  =  2',  for  any  i  and 

je  'V(iJ) 

t.  Since  if  7i  e  Q  then  TC“^  s  D~^  €  Q_i,  the  equation  =  2‘  is  also  true. 

2' -1-1 

(if)  Note  that  actually  =  2‘ ,  for  all  0  <  /t  <  i,  iff  I  0  <  y  <  2*'^^  -  1 }  is  a 

j=0 

CRS(2''*'*).  Thus,  sve  need  to  prove  the  problem  that  if  Y,  lP(D(J))]^  = 

,■€  n't  I.:,) 

X  =  2‘,  for  any  i  and  /,  then  it  is  sufficient  to  show  that  {p(D(y))  I  j  =  'F(i. 

,/e  4'/((i.n 

r)}is  a  CRS(2‘‘^^),  for  any  i  and  t,  which  in  turn  implies  that  the  permutation  ti  e  fi. 

According  to  Theorems  2  and  3,  for  any  0  <  i  <  n  -  2  and  0  <  r*,  r**  <  2'*"'“'  -  1. 
there  exist  two  sets  4^(i,  :*)  and  T'd,  r**)  such  that  'F((,  r*  )  n  'F(/,  r**)  =  o  and  '¥ii.  1“ ) 
u  T'ft,  r**)  =  T'fz  +  1,  t).  For  i  =  0,  if  the  following  conditions  are  true: 


I  [p(D(7))lo  =  2^,  (i.e.,  {p(D(y))l7  €  m  ^’)}is  a  CRS(2),) 

je  iPfO.r') 

I  [p(77(7))]o  =  2°,  (i.e.,  {p{D(J))  I  j  €  m  t**)}is  a  CRS(2),) 

je  'ViO.:") 


and  £  [p(f5  0))]i  =  2, 

ye  4'(l,t) 

then  there  are  two  possibilities: 

(T)  IpiDly))  !  J  e  ^fl.  Ojis  a  CRS(2-),  i.e.,  {fp(D(y))],o  I  j  €  T^fl.  n)  =  (00.  01,  10. 
111. 


(2)  {[p(0  0))]i.ol  ;■  e  'F(l,  r))  =  {00,00,  11,  11}  or  {01,  01,  10,  10).  For  each  case,  there 
must  exist  some  <I)[1,  t']  such  that  {[p(DO))]i,o  ^  J  ^  ^  )}  =  (00,  00,  11,  11)  or 

{01,  01,  10,  10};  otherwise  there  will  be  at  least  one  <1>[1,  r  ]  such  that 


S  [p(^0))]i  *  2.  If  this  is  true,  then  it  will  result  in  an  odd  number  of  I’s  in 
;€  ‘t'd./’’ 

routing  tags  which  are  used  at  stage  b  of  at  least  one  subnetwork  [b,  t  ]  where  0  <  b 
<  n  -  2.  That  is,  we  have  *  2^  for  at  least  one  subnetwork  0^{b, 

;  e  4'(5.x“' 

r  ].  Thus,  it  will  always  be  detected  that  {p(D(J))  I  j  e  'Ffl,  r)}  is  not  a  CRS(2“V 

By  induction,  we  can  show  that  if  ^  [p(f^0  ))]t  =  X  =  2* ,  for  all  0  <  /t 

'V(k.t)  je  4',(fc.r) 

<  i  and  0  <  r  <  2"“''^"*,  then,  for  each  0  <  /fc  <  i  and  0  <  t  <  {p(f^(/))  I  J  6  r)} 

is  a  CRS(2^*').  Thus,  if  ^  [p(f5  (/))]«  =  S  [^~^C/)]i  =  2‘,  for  any  i  and  r,  then 

;€  H'ft.n  je 

{piDiJ))  I  J  e  T^(i,  :)}is  a  CRS(2‘*^),  for  any  i  and  t.  □ 

Note  that  Theorem  8  do  not  imply  that  if  Ji  e  Q  then  jt  s  and  vice  versa.  For 
example,  let  us  consider  the  admissible  permutation  in  Fig.  4.5  and  check  the  subnetwork 
Of2,  1]  where  the  corresponding  set  ^^(2,  1)  =  {1,  3,  5,  7,  9,  11,  13,  15}.  We  have 

{[p(D(J))]2  I  j  e  ^{2,  1)}  =  {[p(£>(l))]2,  WO))h,  [p(D(5))]2,  [p(0(7))]2,  [p(Z7(9))]2, 

[p(D(ll))]2,  [p(£)(13))]2,  [p(D(15))]2}  =  {[2)2,  [112,  [1512,  [312,  [512,  [14]2,  {8)2,  [4]2}  =  {0, 

0,  1,  0.  1,  1,  0,  1).  Thus.  X  [pff?0))l2  =  2^.  Moreover,  4^(2,  1)  =  'F(l.  1)  u  4^(1,  3) 

where  4^(1,  1)  =  (1,  5,  9.  13}  and  4^(1,  1)  =  (3,  7,  1 1.  15).  It  also  can  be  shown  that 
X  [p(f)0'))li  =  2  and  £  [p(D  0  ))],  =  2.  Similarly,  =  (12.  13.  15.  10.  1.  6.  0. 

ye  ^Fn.l)  ye  4'(1,3) 
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11,  3,  14,  9,  4,  7,  8,  2,  5).  We  can  obtain  that  ^  [D~^(J)h=2. 

je  4',(l./) 


Theorem  8  implies  that  for  each  permutable  structure  <!>[/,  r],  the  work  to  determine  the 
CRS  property  of  {p{D(J))  I  j  €  'R(i,  r)},  t))  =  2*'^^  in  Theorem  7  can  be  replaced  by 

two  bit  summation  operations.  That  is,  we  sum  the  ith  bits  of  all  the  destination  tags 
p(^C/)).  where  j  €  4^(/,  r),  and  sum  the  ith  bits  of  all  the  destination  tags  where  j 

e  ^^(i,  t).  Then,  we  check  whether  or  not  both  of  the  results  are  equal  to  2‘.  Both  the 
admissibility  conditions  (Theorem  6)  of  the  permutable  substructures  of  an  Omega  network 
and  its  inverse  network  (an  inverse  Omega  network)  need  to  be  satisfied.  An  algorithm  to 
determine  the  admissibility  of  a  permutation  based  on  Theorem  8  is  described  as  follows: 


function  ADMISSIBILITY -2  (y  e  {0,  ...,  N  -  1},  D:  permutation) 
for  i  =  0  to  rt  -  1  do 

for  r  =0  to  2'*“'“^  -  1  do 


if 


I  [p(Z?(7))]. 


=  S  [(i5"‘(7))], 


i 


=  false 


then  return  false 
return  true 


Conclusions  can  be  made  for  our  work  and  previous  ones  in  Theorem  5.  Conditions  in 
Theorem  5  are  essentially  the  non-conflict  criteria  for  any  switching  elemeni(s).  That  is.  no 
two  paths  of  a  permutation  routing  pass  through  the  same  input  port  of  a  switching  element, 
i.e., 

(OIrt-l.i,)  *  (li  ]b-l:0’  0  )ln-l:6  )> 

for  any  0  <  i ,  y  <  <V  -  I,  1  <  <  n  -  1. 


This  is  a  one -dimension  viewpoint  to  understand  what  the  admissible  permutations  of  an 
Omega  network  are.  On  the  other  hand,  our  work  exploits  all  the  structures  (subnetworks) 
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whijh  are  relative  to  permutation  routing  behavior  of  an  Omega  network.  Then,  we  develop 
the  non-confiict  criteria  for  these  structures  with  the  aid  of  the  structure  of  its  inverse  Omega 
network  which  as  mentioned  above  can  be  sufficiently  represented  by  a  very  simple  bit- 
summation  condition.  Thus,  our  work  provides  a  two -dimension  viewpoint  to  understand 
what  the  admissible  permutations  of  an  Omega  network  are.  It  is  obvious  that  our  method  is 
simpler  and  easier  than  previous  ones. 

4.5.  GENERAL  MODEL 

Generally  speaking,  there  exists  a  class  of  topologically  equivalent  networks  which  are 
constructed  by  the  BPC  permutation  connections  and  possess  the  unique-path  and  full-access 
propenies.  As  we  will  see,  even  through  this  class  of  networks  represents  only  a  subset  of 
Banyan  networks,  it  provides  more  attractive  communication  aspects  than  other  networks 
which  are  constructed  by  irregular  connection  patterns.  For  example,  the  BPC  permutation 
connections  for  Omega  networks  are  perfect  shuffle  permutations.  Each  network  of  this  class 
has  the  similar  routing  behavior  and  thus  the  similar  expression  of  transition  sequences  like 
Omega  networks.  This  class  of  networks  includes  the  six  networks  mentioned  in  [WuFeSl] 
as  special  cases.  The  connection  patterns  used  between  stages  of  them  arc  a  specified  set 
from  { ) .  Their  transition  sequence  which  represents  any  path  connecting  network  input  s 
=  •  ••-  ■^1.  -^o)  to  network  output  D  (5)  =  (d„_i,  ....  dj,  do)  has  the  following  properties. 

(1)  Each  bit  of  the  sources  s  (or  its  complement)  will  be  permuted  to  the  position  of  LSB  in 
some  s‘  and  then  be  replaced  by  a  bit  of  the  destination  D(s)  (or  its  compiement)  in  D'l.v ),  0 
<  i  <  n  -  1.  Therefore,  there  exist  two  permutation  functions  O ,  s  {(3„  }  such  that  0{s ) 

corresponds  to  the  order  for  bits  of  5  to  be  permuted  to  the  position  of  LSB  (i.e,  is 
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the  LSB  in  5‘)  and  r^(Dis))  corresponds  to  the  order  for  bits  of  D{s)  to  replace  bits  of  s 
(i.e.,  replaces  [0(5)],  in  the  LSB  position  of  D'is)).  The  physical  meaning  of 

permutation  function  I  is  as  follows:  if  the  ith  bit  of  a  number  X  =  x„_2,  Xj,  Xq) 

instead  of  bit  [I~^(D  is))]i  replaces  [0(5)]j  in  D'(5),  then  /(X)  represents  the  final  destina¬ 
tion  where  the  source  s  will  reach.  These  two  BPC  permutation  functions  O  and  I  are 
referred  to  as  characteristic  functions  of  a  network. 

(2)  Data  packets  from  source  s  are  routed  from  input  pon  [0(5)],  to  output  port 
[/“kD(5'))],  of  a  switching  element  at  stage  i  and  the  address  of  this  switching  element  is 
either  [5']n-i:i  or  [0‘ (^ )]„_i;i. 

(3)  The  routing  scheme  of  this  class  of  networks  can  be  described  as  follows.  Let  the  sym¬ 
bol  @  represent  the  exclusive-or  operation.  Bit  [/“nO(5))],  is  used  as  the  routing  tag  for  the 
switching  element  at  stage  i  such  that  data  packets  are  routed  from  input  port  [0(5)]^  to  out¬ 
put  pon  [/"'(D (5 ))], .  Bit  [0(5)],  @  [/"'(D (5))],’  is  used  to  determine  the  state  of  the 
switching  element  at  stage  i  if  global  routing  is  considered  and  no  conflict  occurs.  That  is,  if 
[0(5)],  @  [/”kD(5))],  =  0  then  the  switching  element  will  be  in  a  straight  connection  state 
(i.e.,  0-state),  else  the  switching  element  will  be  in  an  exchange  connection  (i.e..  1-state). 
After  a  data  packet  traverses  stage  i ,  bit  [O  (5 )],  (i.e.,  the  label  of  the  input  pon  from  which 
this  incoming  data  packet  comes)  is  attached  to  this  data  packet  in  order  to  recover  the  infor¬ 
mation  of  the  source  address.  We  call  this  kind  of  routing  scheme  as  the  source -preserved 
and  destination -oriented  routing  scheme.  It  is  clear  that  the  routing  behavior  of  any  network 
in  this  class  can  be  uniquely  characterized  by  functions  O  and  I .  Note  that  not  all  the  net¬ 
works  with  fi’.il  access  capability  and  unique-path  propeny  possess  this  kind  of  simple  routing 
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scheme.  In  general,  for  a  network  with  irregular  connection  patterns,  the  routing  tag  used  at 
each  stage  is  a  function  of  both  source  input  and  destination  output. 

For  example,  in  Fig.  4.6,  a  16x16  4-stage  network  defined  by  a  sequence  of  BPC  permu¬ 
tation  operations  is  shown.  Let  the  characteristic  functions  of  this  network,  O  and  /,  be 
specified  by  vectors  O  and  / ,  respectively.  Let  the  interconnection  pattern  0  <  i  <  n ,  be 
specified  by  vector  F;  and  Pq  =  (2,  -1,  -0,  3),  Fj  =  (2,  3,  0,  -1),  P2  =  (-0,  L  3,  -2),  F3  = 
(2,  -0,  3,  1),  F4  =  (1,  3,  0,  2).  The  transition  sequence  is: 

.)  =  ( ■;!.  2,  .S' I .  S())  =  (S2»  -^O’  -^3) 

)  =  (5  2’  ^1-  %  ^2)  =  (^1’  •^2’  -^o) 

D  '(5 )  =  (Si,  52,  di,  <^3)  -S'"  =  (^3*  A’  ^2) 

D-(S)  =  {^3,  d2,  F;,  do)  =  (^2'  ^0’  ^1) 

D ~ (s  )  ~  {d df),  d d<)  5^  =  (d-^,  d^’dj,  do)  —  D  (s 

Thus,  we  have  6  =  (-1,  -2,  0.  3),  /  =  (-1,  0,  3,  -2),  1’^  =  (1,  -0.  -3,  2),  I  d  1  =  (1,  2,  0,  3) 
and  \r^\  =  (1,  0,  3,  2).  For  any  path  connecting  a  .source  5  to  a  destination  Dis).  bit  d2  is 
used  as  the  routing  tag  at  stage  0,  bit  d3  is  used  as  the  routing  tag  at  stage  1,  bit  do  is  used  as 
the  routing  tag  at  stage  2,  and  bit  dj  is  used  as  the  routing  tag  at  stage  3.  The  states  of 
switching  elements  from  stage  0  to  stage  3  are  determined  by  53  @  d2,  5o  @  do, 
^2  @  do,and  Tj  @  dj,  respectively.  Hence,  for  the  path  connecting  s  =  1  to  D  (5)  =  4,  the 
routing  tags  are  [/“'(D  ("5 ■))]  =  (d,,  do,  d^,  dn)  =  (0,  1,  1,  1)  and  states  are  fa)  d-, 
T2  @  di),  Vo  (&  d;,  53  (2)  do)  =  (1,  0,  0,  1). 

.As  a  network  in  this  class  is  specified  by  its  two  characteristic  functions  O  and  / .  it  can 
be  shown  that  function  O  uniquely  determines  ail  the  sets  T'fi ,  r ),  0  <  i  <  n  -  1  and  0  <  r  < 
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jn-i-i  _  I~^{D)  is  the  actual  permutation  where  computation  should  be  performed  to 

determine  the  admissibility  of  a  given  permutation  D .  That  is,  the  permutation  capability  of 
this  network  is  uniquely  characterized  by  its  two  characteristic  functions.  Therefore,  all  our 
work  done  in  previous  sections  can  be  easily  extended  to  the  general  model  by  using  charac¬ 
teristic  functions.  We  summarize  the  generalization  as  the  following  theorems.  In  the  follow¬ 
ing,  let  r  be  an  n -stage  network  in  the  class  specified  by  characteristic  functions  O  and  I . 

THEOREM  9:  Let  Fyj  be  the  inverse  network  of  T.  Then,  the  characteristic  functions  of 
are 

Off  =  p  /"'  and  //j“'  =  p  O . 

Proof:  For  any  transition  sequence  of  T,  we  have  a  new  explanation  for  F /f .  Note  that 
the  jth  stage  of  F/j  becomes  the  (n-i)th  stage  of  F.  Thus,  p7“'(D(5))  corresponds  to  the 
order  for  bits  of  D(s)  to  be  permuted  to  the  position  of  LSB  (i.e.,  [p7“kF>  (5 ))],-  is  the  LSB 
in  DUs))  and  p  <9(5j  corresponds  to  the  order  for  bits  of  s  to  replace  bits  of  Dis  )  (i.e. 
[p  <9(5)l;  replaces  [p7"kD  (s ))],  in  the  LSB  position  of  5* ).  □ 

For  example,  let  the  interconnection  pattern  /?, ,  0  <  i  <  n .  be  specified  by  vector  /?,  for 
the  inverse  network  of  the  net  vork  in  Fig.  4.6.  Then,  we  have  Rq  =  =  (2,  0,  3,  1),  /? ]  = 

^3“'  =  (1,  3,  0,  -2),  R2  =  P2^  =  (L  -0,  2,  -3),  /?3  =  Ff’  =  (2,  3,  -0,  1),  and  R^  =  Pq^  = 
(0,  3,  -2,  -1).  The  transition  sequence  for  connecting  source  input  D{s)=  {d^,  <^2,  di,  d^)  to 
destination  output  5  =  (53,  5  2’  ^  i> 

D  {s  )  —  (dj,  U2,<ii,7g)  id  2i  df),  d  2,  d  i) 

D  (s  ))  =  (d^’  dff,  6^3,  5  ] )  5  *  =  id 7 2,  5  j,  7q) 
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Fig.  4.6.  A  16  X  16  network  defined  by  the  general  model. 
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D  ^{D  is  ))  =  (^3,  ^2-  ^2)  ^2^  ^2«  ^3) 

DHD(s))  =  (^i,  S2,  d2,  Sq)  =  (^2,  ^b  •^0.  ^2) 

D-(D(s))  =  (S2,  ^1,  Sq,  S 2)  s*  =  is 2,  S2,  Si,  Sq)  =  S. 

It  is  clear  that  0^  =  p  (l,  -O,  -3,  2)  =  (2,  -3,  -0,  1)  and  =  p-(-l,  -2,  0,  3)  =  (3,  0,  -2, 
-1). 

THEOREM  10:  A  permutation  k  is  admissible  on  F  iff  for  each  i  and  j ,  0  <  i,  j  <  N  - 
1  and  i  ^  j ,  either  one  of  the  following  two  conditions  is  true: 

(1)  i[rHDii))]i,_i,Q,  \Oii)],_y,i,)  *  i[rHDij))]„_i.Q,  [OiJ)]„_i,i,),  for  all  l  <  h  <  -  1. 

(2)  [/-‘(OO))]^  =  [Oii)]f,  @  f^i[r\Dii))]i,_i.Q,[Oii)]^_i,t,^i),  where  @  is  the 
exclusive-or  operation  and  is  a  Boolean  function. 

PROOF:  This  theorem  is  a  generalized  Theorem  5.  By  using  the  same  criteria  as 
Theorem  5,  the  above  two  conditions  give  the  non-conflict  criteria  for  any  switching 
element(s)  of  F,  That  is,  no  two  paths  of  the  permutation  7t  pass  through  the  same  input  port 
of  a  switching  element  of  F.  □ 

THEOREM  11:  By  forcing  all  the  switching  elements  of  F  at  stage  i  to  0-state  or  1- 
state,  two  disjoint  (/i-l)-stage  subnetworks  are  formed  such  that  in  each  subnetwork,  the 
addresses  of  network  inputs  agree  in  bit  [  I  O  I  (5 )] j  and  the  addresses  of  network  outputs  agree 
in  bit  [I/-M(D(5))].  . 

Proof;  The  proof  is  similar  to  that  of  Theorem  1  except  that  we  use  [O  fs  )];  instead  of 
-  That  is,  we  have  the  following  statement:  forcing  all  switching  elements  at  stage  i  to 
0-state  (1 -state)  is  equivalent  to  forcing  the  LSB,  [0(5)],-,  of  5'  to  be  replaced  by 
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[/■^(D(5))]i  =  [0(5)]j  (1  -  [<9(^)]i)  in  the  LSB  position  of  D^{s).  Thus,  for  0-state  case,  the 
addresses  of  network  inputs  of  each  (/i-l)-stage  subnetwork  agree  in  bit  [0(5)],  = 
[/“'(D(5))],  and  the  addresses  of  network  outputs  agree  in  [/“HOC^))],  =  [0(5)];.  Similarly, 
for  1-state  case,  the  addresses  of  network  inputs  of  each  (n-l)-stage  subnetwork  agree  in  bit 
[(9(5)].  =  1  -  [/'Ho  (5))],-  and  the  addresses  of  network  outputs  agree  in  [/“^(D  (5  ))],•  =  1  - 
[0(5)].-.  C 

For  example,  by  forcing  all  the  switching  elements  at  stage  1  of  the  16x16  4-stage  net¬ 
work  in  Fig.  4.6,  the  addresses  of  network  inputs  of  each  3-stage  subnetwork  agree  in  bit 
[(9(5)]j  =z  5q  =  1  -  [/“^(D(5))]i  =  (d)2  =  and  the  addresses  of  network  outputs  agree  in 
[/“^(D(5))]i  =  ^3  =  1  -  [0(5)],  =  Tq.  That  is,  each  3-stage  subnetwork  has  network  inputs 
[(c,  c,  c,  5o)}  and  network  outputs  ((5o,  c,  c,  c)}.  Thus,  one  of  the  two  subnetworks  has 
network  inputs  (0,  2,  4,  6,  8,  10,  12,  14]  and  network  outputs  [1,  2,  3,  4,  5,  6,  7). 

THEOREM  12:  Let  0[(,  r],  0  <  /  <  n  -  1  and  0  £  r  <  2""‘~^  -  1,  be  a  subnetwork  pro¬ 
duced  by  performing  the  same  partitioning  scheme  (as  mentioned  in  Section  4.3)  on  F.  For 
each  i,  0  <  i  <  n  -  1,  let  {'F(z,  w)  I  0  <  m  <  2"“'“'  -  1}  be  the  panition  on  the  network 
inputs  [0,  1,  ...,  iV-l]  corresponding  to  the  set  of  subnetworks  (d)[/,  :]  I  0  <  :  <  2'*“'“^  -  1}. 
Then,  H'O,  u)  =  ((v„_i,  ...,  Vj,  Vg)]  such  that  V;  =  c,  for  all  /  ^  j,  where  5^  =  [I  (9  I  (5)]^.,  i 
■hl<k<n-\.  Thus,  #(c)  =  i  -h  1  in  'F(i,  u). 

Proof:  The  proof  is  similar  to  that  of  Theorem  2  and  is  based  on  Theorem  10.  □ 

Theorem  12  shows  that  function  lO  I  uniquely  determines  all  the  /t  panitions  (T'O,  u)  I 

0  <  u  <  2""'“*  -  I ),  0  <  /  <  n  -  1.  For  example,  let  us  find  the  partition  (^(0.  u  )  i  0  <  u  < 

2-^  -  1 )  for  the  16x16  4-stage  network  in  Fig.  4.6.  Since  !  (9  I  (5 )  =  ^5i.  53,  .''O’  ^  we  have 
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{Sj }  =  {[I  o  I  U)]i  I  1  <  ^  <  3)  =  {jq.  S2,  Thus,  ^(0,  u)  -■  {c,  Vj,  Vj,  Vq).  In  a  similar 
way,  we  can  find  the  other  two  partitions.  Therefore,  the  three  partitions  [^{i,  r)  1  0  <  f  < 
2^“‘  -  1},  0  <  z  <2,  corresponding  to  the  three  sets  of  subnetworks  {d)[z,  r]  I  0  <  r  <  2^~‘  - 
1 } ,  0  <  z  <  2,  are 

{m  M)} 

=  {{(c,0,  0,  0)},  {(c,0,  0,  D),  {(c,0,  1,0)),  {(c,0,  1,  1)}, 

{(c,  1,  0,  0)},  {(c,  1,0,  1)1,  {(c,  1,  1,  0)},  ((c,  1,  1,  1)}} 

=  {{0,  8),  fl,  9),  (2.  10},  {3.  11),  {4,  12),  {5.  13).  {6.  14),  [7,  15)}, 

{T^d,  zO) 

=  { {(c,  0,  0,  c),  {(c,  0,  1,  c)},  {(c,  1,  0,  c)},  {(c,  1,  1,  O) } 

=  {{0,  1,  8,  9),  {2,  3,  10,  11),  {4,  5,  12,  13),  {6,  7,  14,  15)}, 

(4^(2,  zz  i) 

=  {{(c,  c,  0,  c),  {(c,  c,  1,  c)}} 

=  {(0,  1,  4,  5,  8,  9.  12,  13),  {2,  3,  6,  7,  10,  11,  14,  15)). 

THEOREM  13:  The  permutation  n  is  admissible  on  T  iff  [1~^{D{J))  1  j  e  Hr'fz,  t))  is 
CRS(2‘^^)  or  {I/-M(D0'))  I  j  eW,  r)}  is  CRS(2‘^'),  for  all  0  <  z  <  n  -  1  and  0  <  r  < 
_  1. 

PROOF:  This  proof  is  similar  to  those  of  Theorems  6  and  7.  Note  that  any  permutation 
function  [3  €  { [3^  is  closed  on  domain  {0,  1,  ...,  .V  -  1),  i.e.,  {(3(z)  I  0  <  i  <  ,V  -  I)  =  (0.  1, 
....  .V  -  1).  Thus,  it  is  clear  that  f/“*(D(y))l  j  e4^(z,  f)}  is  CRS(2‘'^^')  iff  ( ! /“M  (D  (7 1)  I  j 
e'Piz,  /))  is  CRS(2‘^').  Therefore,  [I~Hd  (J))  \  j  e4^fz,  ;)}  is  CRS(2‘^’')  gives  the  neces¬ 
sary  and  sufficient  condition  for  non-conflict  at  any  switching  element(s)  of  subnetwork  0[z. 
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r].  □ 

THEOREM  14:  The  permutation  k  is  admissible  on  f  iff  ^  [f  0 ))](  = 

je  'i'(i.t) 

£  [p-OCD-^O))].  =2‘  or  £  [lrH(DO))]i  =  S  [p-l  O  I  2‘ ,  for  all 

ye  y«0,/)  j€  'f'(i.t)  Je  '¥g(i.t) 

0<i<n-land0<[<  -  1. 

Proof:  This  proof  is  similar  to  that  of  Theorem  8.  It  can  be  proved  that  the  condition 

I  =  I  [//TkD-^C/))].-  =  I  [pO(D-^Um  =  for  any  i  and 

ye  'i'(i.i)  ye  Je  4',(i,0 

r,  is  sufficient  to  derive  that  {/~\D(j))  1  j  s  4^(/,  r)}is  a  CRS(2‘'^^),  for  any  i  and  r,  which 
in  turn  implies  that  the  permutation  is  admissible  on  F.  □ 

For  example,  in  Fig.  4.6,  we  also  show  why  a  permutation  (12,  4,  14,  6,  13,  5,  15,  7,  8, 
0,  10,  2,  9,  1,  11,  3)  is  admissible  on  this  network. 

THEOREM  15:  Let  F^  and  F^,  be  two  networks  specified  by  characteristic  functions  0^, 
4  and  Oy,  ly,  respectively.  F;^  and  F^  are  in  a  subclass  of  equivalent  networks  with  the 
same  set  of  admissible  permutadons  ifflO^^I  =1(9^1  and  1 4 1  =1/^1. 

Proof:  This  proof  is  based  on  Theorems  12,  13  and  14.  According  to  Theorem  12, 
function  O  uniquely  determines  all  the  n  partitions  {'F(/,  t)i  0  <t  <  -  1},  0  </<  n 

-  1.  And  according  to  Theorems  13  and  14,  for  any  permutation  D,  the  results  of  bit- 
summation  and  comparison  operations  performed  on  set  {/"’(D  (y  ))  I  0  <  j  <  N  -  1}  are  the 
same  as  those  on  {  \  \  (D  (J))  I  0  <  j  <N  -  1).  Thus,  any  two  networks  with  the  same 

□ 


absolute  characteristic  functions  have  the  same  set  of  admissible  permutations. 
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Theorem  15  provides  a  direct  view  of  equivalence  between  networks  by  the  set  of  admis¬ 
sible  permutations.  Some  authors  [ParSO]  [Agr83]  denoted  it  as  functional  equivalence .  That 
is,  if  two  networks  can  realize  the  same  set  of  permutations,  then  they  are  functionally 
equivalent.  According  to  Theorem  15,  for  any  pair  of  functions  O ,  I  e  {|3„  },  if  l(9l  =  O  and 
I/I  =  /  ,  there  exists  a  subclass  of  functionally  equivalent  networks  with  the  same  set  of  admis¬ 
sible  permutations  which  are  characterized  by  O  and  /.  Since  there  are  !  =  log2A!  of  such 
function  O's  or  /’s,  it  is  easy  to  show  that  the  whole  topologically  equivalent  class  of  net¬ 
works  can  be  partitioned  into  flog2A^)^  disjoint  subclasses.  In  any  subclass,  each  network  is 
not  only  a  different  drawing  of  another  network  but  also  realizes  the  same  set  of  permutations. 

For  example,  in  Table  4.1,  the  general  form  of  the  characteristic  functions  and  the  set  of 
partitions  of  several  famous  networks  are  shown.  From  Table  4.1,  we  obtain  the  following 
facts.  The  Baseline  and  inverse  Baseline  network  have  the  same  set  of  admissible  permuta¬ 
tions.  The  Omega  and  inverse  Indirect  Binary  Cube  network  have  the  same  set  of  admissible 
permutations.  The  Indirect  Binary  Cube  and  inverse  Omega  network  have  the  same  set  of 
admissible  permutations. 

In  Parker’s  work  [ParSO],  the  functional  equivalence  of  three  networks  (i.e.,  the  inverse 
Omega,  Indirect  Binary  Cube  and  R  -network)  are  proved.  Identity  relations  between  several 
specific  permutation  functions  are  used  to  transform  a  network  to  another  one.  Even  though, 
conceptually,  the  method  can  be  generalized  (which  in  our  opinion  will  be  very  complicated) 
to  prove  the  functional  equivalence  of  other  networks,  it  restricts  our  view  to  a  one-dimension 
solution  as  that  in  condition  (2)  of  Theorem  5  to  outline  what  the  permutations  which  a  net¬ 
work  can  realize  really  look  like.  It  is  clear  that  our  method  provides  a  two-dimension  solu¬ 
tion  by  simple  bit  relations  and  a  more  direct  insight  than  that  in  [ParSO]  to  describe  the 
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Table  4.1.  The  general  forms  of  the  characteristic  functions  of  several  famous  networks. 


Networks 

6 

W.W)  =  ((Vn-i... 

..  Vl,  Vo)| 

Delta  network 

(1.2 . 

(0,  1,  ...,n-2,  n-l) 

. 

•,V1,C)} 

inverse  Delta  network 

(n-l,  n-2 . 1, 0) 

(0.  n-l . 2.  1) 

l(v,v-l . V,vi,C,. 

..,c)| 

Omega  network 

(0,  1....,  n-2,  n-l) 

(0,  1,  ...,n-2,  n-l) 

((c,. . C,  V'n-i-2> •  • 

vo)l 

inverse  Omega  network 

(n-l.  n-2 . 1,0) 

(n-l,  n-2...,  1,0) 

((v^i . V,Vi,C,. 

..,C)] 

Baseline  network 

(n-l,  n-2 . 1.0) 

(0, 1, ....  n— 2,  n— 1) 

•  •  •  t  ^(Vl»  • 

...c)| 

inverse  Baseline  network 

(n-l,  n-2 . 1, 0) 

(0, 1 . n-2,  n-l) 

((Vn-I . V/+J,C,. 

...c)| 

Indirect  Binary  Cube 
network 

(n-l.  n-2 . 1,0) 

(n-l.  n-2, ....  1.0) 

{(^n— 1 » •  •  •  *  • 

inverse  Indirect  Binary 

Cube  network 

(0,  1. ....  n-2,  n-l) 

(0.  l....,n-2,  n-l) 

|(c» . . . ,  •  • 

.,vo)l 
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meaning  of  functional  equivalence. 

4.6.  SUMMARY 

In  this  chapter,  by  employing  a  proper  partitioning  scheme,  the  properties  of  a  number  of 
permutable  substructures  (subnetworks)  on  an  Omega  network  are  studied.  These  substruc¬ 
tures  are  associated  with  some  specific  partitions  on  the  network  inputs  and  can  be  used  to 
characterize  admissible  permutations  of  an  Omega  network.  Based  on  the  understanding  of 
these  substructures,  the  permutation  capability  of  Omega  networks  is  characterized  by  either 
using  the  residue  classes  or  bit  relations  of  destination  tags.  We  propose  an  algorithm  to 
determine  the  admissibility  of  a  permutation  on  an  Omega  network  which  has  a  time  com¬ 
plexity  O  {N ),  where  N  is  the  number  of  inputs/outputs  of  the  network.  Finally,  we  general¬ 
ize  the  same  methodology  used  on  Omega  networks  to  a  class  of  topologically  equivalent  net¬ 
works  defined  by  BPC  permutations  in  which  each  network  can  be  specified  by  two  charac¬ 


teristic  functions. 
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CHAPTER  5 


A  FAULT-TOLERANT  RECONFIGURATION  SCHEME 
FOR  MULTIPROCESSORS 


5.1.  INTRODUCTION 

One  of  the  most  cost-effective  ways  for  interconnecting  a  very  large  number  of  proces¬ 
sors  to  form  a  general-purpose  multiprocessor  system  is  to  employ  a  Multistage  Interconnec¬ 
tion  Network  (MIN)  [WuFeSl]  [ParSO]  [Pea77],  In  such  a  system,  the  MIN  which  is  a  criti¬ 
cal  component  provides  a  full  access  communication  between  processors  However,  physical 
failures  in  a  MIN  can  cause  severe  degradation  in  the  system  performance,  unless  efficient 
methods  are  provided  to  handle  them. 

Various  issues  concerning  the  analysis  of  fault  tolerance  capability  and  reliability  of  mul¬ 
tiprocessor  systems  with  MINs  have  been  studied  in  [GaMa88]  [DaBh85].  In  one  of  these 
methods,  the  failure  of  a  switching  element  in  the  network  causes  the  removal  of  a  number  of 
processors  such  that  the  system  can  operate  in  a  degraded  mode  in  which  the  full  access  pro¬ 
perty  can  be  maintained  among  the  remaining  processors.  However,  this  strategy  results  in  an 
enormous  waste  of  computational  resources.  As  a  MIN  is  used  for  interprocessor  connection. 
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an  alternate  strategy  to  minimize  the  loss  of  computational  resources  is  to  allow  the  communi¬ 
cation  with  multiple  passes  through  the  faulty  network  by  using  the  remaining  fault-free  paths. 
A  multiprocessor  system  with  a  faulty  network  is  said  to  jxjssess  the  dynamic  full  access 
(DFA)  propeny  if  each  processor  in  the  system  can  communicate  with  any  other  processor  in 
the  system  in  a  finite  number  of  passes  through  the  faulty  ..etwork,  by  routing  the  data 
through  proper  intermediate  processors  if  necessary  [ShHa84]  [VaRa89].  This  strategy  results 
in  a  reconfigured  system  which  can  operate  in  a  gracefully  degraded  mode  at  the  expense  of 
routing  overhead,  the  increased  latency  and  the  additional  blocking  due  to  the  loss  of  com¬ 
munication  paths.  As  the  studies  in  [ShHa84]  [VaRa89]  [AgLe85]  have  shown,  the  general 
problem  of  determining  the  DFA  property  of  a  faulty  network  is  as  hard  as  a  transitive  closure 
problem.  No  general  necessary  and  sufficient  conditions  have  been  found  yet  to  determine  the 
DFA  property  based  on  the  distribution  of  faulty  components  on  the  network. 

A  successful  survival  of  a  multiprocessor  system  in  the  presence  of  network  failures 
requires  solutions  of  the  following  problems. 

(1)  A  fast  and  effective  fault  testing  algorithm  to  detect  failures  of  the  network. 

(2)  A  multi-fault  diagnosis  algorithm  to  locate  all  the  faults. 

(3)  A  real-time  reconfiguration  scheme  to  prevent  the  waste  of  additional  computational 
effort. 

Several  studies  of  fault  testing  and  diagnosis  algorithms  can  be  found  in  [Agr82] 
[ThNe83]  [WuFe79]  [NaSo80]  [FuAb83]  [AgrSO]  [FaPrSl].  In  this  chapter,  we  address  only 
problem  (3).  We  assume  that  the  information  of  locations  of  all  the  faulty  components  in  the 
network  is  available.  Central  to  the  design  of  such  a  reconfiguration  scheme  is  the  utilization 
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of  this  given  information  to  reconfigure  the  system  into  a  single  (sub)system  or  several  sub¬ 
systems  with  DFA  property  such  that  the  original  routing  scheme  can  be  preserved  in  each 
subsystem.  The  fault-tolerant  reconfiguration  scheme  to  be  presented  is  suitable  for  the  on¬ 
line  and  real-time  applications.  The  scheme  is  simple,  efficient,  and  applicable  to  all  the  net¬ 
works  discussed  in  the  literature.  A  special  network  topology,  the  Omega  network  [Law75], 
is  used  as  the  example  network  in  this  chapter.  Several  important  problems  which  have  not 
been  previously  considered  are  addressed  in  our  work  and  discussed  in  reference  to  an  inter- 
grated  model.  Those  which  distinguish  this  chapter  from  previous  work  [ShHa84]  [VaRa89] 
[.AgLe85]  are  summanzed  as  follows; 

(1)  In  many  faulty  situations,  some  processors  might  be  completely  isolated  from  other  pro¬ 
cessors  (i.e.,  no  fault-free  paths  exist  between  them  and  other  processors).  If  this  infor¬ 
mation  is  not  known,  the  data  communication  to/from  these  processors  will  block  other 
fault- free  communication  paths  and  significantly  degrade  system  performance.  Therefore, 
it  is  extremely  imponant  for  the  system  to  obtain  this  information  in  order  to  disable 
these  processors  and  obtain  a  better  communication  load  control.  In  this  chapter,  this 
information,  which  is  missing  in  previous  work,  is  obtained. 

(2)  In  [ShHa84]  [VaRa89]  [AgLe851,  the  authors  were  only  interested  in  determining  the 
sufficient  conditions  for  a  faulty  network  to  possess  the  DFA  property.  In  this  chapter, 
we  will  show  that  even  if  the  original  system  does  not  have  the  DFA  property  due  to 
faults  in  the  network,  the  surviving  system  obtained  after  disabling  those  processors 
defined  in  fl)  may  have  the  DFA  property.  Since  there  exist  many  possible  multipass 
communication  paths  between  surviving  processors  (those  processors  in  the  surviving 
system),  an  efficient  way  to  achieve  low  latency  communication  is  by  utilizing  of 
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shortest-path  routes  between  these  processors.  While,  in  [VaRa89],  Varma  and 
Raghavendra  mentioned  that  this  is  a  very  important  issue,  they  did  not  actually  show 
how  to  do  it.  In  this  chapter,  a  shortest-path  fault-tolerant  routing  scheme  is  developed 
such  that  by  routing  through  proper  intermediate  processors  a  processor  can  access 
another  processor  with  a  minimal  number  of  passes  through  the  faulty  network. 

(3)  Since  an  acknowledge  signal  and  bidirectional  data  communication  are  always  required, 
it  is  necessary  that  bidirectional  communication  paths  exist  between  any  two  processors. 
However,  in  some  situations,  there  may  exist  only  unidirectional  communication  paths 
between  two  processors.  Such  situations,  as  we  will  show,  are  due  to  the  non-DFA  pro¬ 
perty  of  the  surviving  system.  The  use  of  such  unidirectional  communication  paths  will 
cause  a  a  deadlock  because  no  possible  acknowledge  signals  will  be  received  by  the 
source  processor.  Therefore,  the  utilization  of  shortest-path  routes  alone  may  not  be 
sufficient  to  survive  a  system.  An  algorithm  to  prevent  deadlocks  must  also  be 
employed.  In  this  chapter,  such  an  algorithm  is  proposed  which  gives  the  solution  in  a 
way  that  the  surviving  system  is  partitioned  into  several  surviving  subsystems  and  each 
subsystem  is  a  maximal  subset  of  processors  which  possesses  the  DFA  property. 

In  summary,  the  fault-tolerant  reconfiguration  scheme  to  be  presented  provides  a  flexible 
reconfigurable  environment  for  a  multiprocessor  system  with  a  faulty  network.  Under  such  an 
environment,  the  communication  of  the  surviving  system  is  operated  by  using  the  information 
of  shonest-path  routes.  The  rest  of  this  chapter  is  organized  as  follows.  In  Section  5.2,  the 
system  and  fault  models  are  presented.  In  Section  5.3,  the  fault-tolerant  reconfiguration 
scheme  is  presented.  This  scheme  contains  five  pans:  routing  behavior  of  Omega  networks 
under  faults,  communication  capability  for  the  first  pass  under  faults,  construction  of  the 
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surviving  system,  construction  of  shortest-path  routing  tables,  and  reconfiguration  of  the  sur¬ 
viving  system.  In  Section  5.4,  the  time  complexity  of  our  scheme  is  analyzed.  Finally,  Sec¬ 
tion  5.5  gives  the  summary  of  this  chapter. 

5.2.  SYSTEM  AND  FAULT  MODELS 

5. 2 A.  System  Model 

In  this  chapter,  without  loss  of  generality,  we  limit  our  discussion  to  an  -processor  sys¬ 
tem  interconnected  by  an  Omega  network  [Law75]  built  with  2x2  switching  elements  (see 
Fig.  5.1).  Such  a  multiprocessor  system  is  connected  to  and  monitored  by  a  front-end  host 
computer.  The  overall  system  configuration  can  be  either  SI\ID  or  MIMD  structure  depend¬ 
ing  on  requirements  of  specific  applications.  An  NxN  Omega  network  with  N  network 
inputs  and  N  network  outputs  consists  of  n  =  \og2^  stages  of  2x2  switching  elements.  Each 
stage  consists  of  N  /2  switching  elements  and  the  interconnection  pattern  between  stages  is  the 
perfect  shuffle  permutation.  Each  switching  element  allows  point-to-point  or  broadcast  com¬ 
munication  from  its  input  ports  to  output  ports  if  no  conflict  occurs.  For  example,  in  Fig.  5.2. 
a  16-processor  multiprocessor  system  connected  by  a  16  x  16  Omega  network  is  shown.  The 
following  conventional  notations  are  used  throughout  this  chapter.  The  stages  of  the  network 
are  numbered  from  0  through  n  ~  \  from  left  to  right  and  the  input/output  ports  (including 
network  inputs/outputs)  of  switching  elements  at  each  stage  are  numbered  from  0  through  N  - 
1  from  top  to  bottom.  The  communication  links  between  stages  are  numbered  according  to 
the  order  of  input  ports  of  the  stage  to  which  these  links  are  connected.  The  communication 
links  before  stage  0  and  after  stage  n  -  1  are  considered  as  pseudo  links  since  they  are  con¬ 
nected  to  output  ports  and  input  pons  of  processors,  respectively.  For  example,  the  labels  of 
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links  connected  to  the  input  ports  of  stage  i  are  shown  in  Fig.  5.2.  Stage  0  is  sometimes 
referred  to  as  the  input  stage  and  stage  «  -  1  as  the  output  stage .  Each  network  input  and 
output  are  connected  to  the  output  and  input  pons  of  a  processor  with  the  same  address.  The 
address  of  a  label  L  is  represented  by  its  binary  form  L  =  ...,  /  j,  Iq),  where  bit  /q  is  the 

least  significant  bit  (LSB).  Thus,  a  switching  element  in  the  network  is  represented  by  an 
ordered  pair  (r,  e),  where  r  (0  ^  r  <  «  —  1)  is  the  stage  at  which  the  element  is  located  and 
e  =  ...,  ei)  (0  <  e  <  N/2  -  1)  is  the  element  address  at  that  stage  such  that  the  two 

input/output  pom  of  e  have  a  common  address  label  equal  to  ...,  c),  c  =  0  or  1.  A 

communication  link  is  represented  by  an  ordered  pair  [r,  h],  where  h  =  ...,  h^,  Hq)  (0  < 

/i  <  iV  -  1)  is  the  address  of  this  link  and  t  (1  <  t  <  n  -  1)  is  the  stage  at  which  the  input 
port  of  a  switching  element  ....  hi)  is  connected  to  this  link.  Also  the  sets  {[0,  h]}  and 

{[n ,  h])  are  used  to  represent  those  pseudo  links  before  stage  0  and  after  stage  n  -  1.  A  set 
of  labels  with  similar  address  representations  can  be  denoted  by  a  common  address  label.  For 
example,  /„_2,  h,  c,  ...,  c)  where  #(c)  =  i  (i.e.,  the  total  number  of  c’s  is  i) 

represents  those  2‘  labels  with  the  same  first  n  —  i  bits  in  their  addresses.  The  notation 
L[a  :  b],  a  >  b ,  is  used  to  represent  a  segment  of  the  address  L  from  bit  to  bit  If,  i.e., 

L\a  .  b  \  —  (/fl '  ^a-\,  ^fe)- 

5.2.B.  Fault  Model 

The  fault  model  we  consider  is  one  that  both  the  switching  elements  and  communication 
links  may  fail.  The  faulty  components  (switching  elements  and/or  communication  links)  are 
treated  as  unusable  and  no  connection  can  be  routed  through  them.  Thus,  a  faulty  set  F  on 
an  Omega  network  is  defined  as  a  set  of  faulty  components  under  consideration.  Some  stan¬ 
dard  definitions  have  been  used  in  several  previous  studies  [ShHa84]  [VaRa89].  We  quote 
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Fig.  5.2.  .A  16-proccssor  multiprocessor  system  interconnected  by  a  16  x  160mega  network. 
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them  here  and  extend  their  definitions  for  our  convenience. 

DEFINITION  1:  A  fault  set  F  in  an  Omega  network  is  critical  with  respect  to  a  subsys¬ 
tem  (system)  iff  it  destroys  the  propeny  of  dynamic  full  access  of  this  subsystem  (system).  □ 

Here,  the  definition  of  the  DFA  property  is  no  longer  restricted  on  the  original  N- 
processor  system  but  on  any  subsystem  composed  of  a  subset  of  the  N  processors.  Hence,  if 
F  is  noncritical,  data  packets  from  any  processor  (or  network  input)  can  be  routed  to  any 
other  processor  (or  network  output)  in  a  finite  number  of  passes  through  the  faulty  network. 

DEFINITION  2:  Let  7t  be  a  permutation  passable  by  an  Omega  network.  The  set  or 
faulty  paths  Cp  of  the  permutation  7t  under  the  fault  set  F  is  the  set  of  communication  paths 
that  pass  through  some  components  in  F  when  the  system  cries  to  realize  ;r  on  the  network.  □ 

For  example,  let  /  be  the  identity  permutation.  The  fault  set  F  -  {(1,0),  (2,0),  (2,1), 
[1,1],  [1,8]}  in  the  network  of  Fig.  5.2  will  affect  those  data  packets  which  pass  through  the 
paths  0  — >  0,  2  ^  2,  4  ^  4,  and  6  ^  6  of  the  identity  permutation. 

DEFINITION  3:  Two  fault  sets  F  and  F  are  equivalent  iff  C/r_„  =  „  for  all  possible 

7t.  The  notation  F  =  F  is  used  to  denote  that  F  and  F  are  equivalent.  □ 

For  example,  the  fault  set  F  =  {(1,0),  (1,4),  (2,0),  (2,1),  [1,0],  [1,1],  [1,8]}  is  equivalent 
to  F  =  {(1,0),  (2,0),  (2,1),  [1,1],  [1,8])  in  the  network  of  Fig.  5.2  because  F  and  F  affect  the 
same  set  of  communication  paths  of  each  passable  permutation. 

DEFINITION  4:  The  maximal  fault  set  F^^  corresponding  to  a  fault  set  F  in  an  Omega 
network  is  the  set  of  maximum  size  that  is  equivalent  to  F.  That  is  to  say,  Fj^^x  =  ^ 
F^,,2F',  for  allF'^F. 


□ 
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5.3.  FAULT-TOLERANT  RECONFIGURATION 

In  this  section,  we  study  a  fault- tolerant  reconfiguration  scheme  for  an  N  -processor  sys¬ 
tem  with  multiple  faults  on  its  Omega  network.  Such  a  scheme  provides  the  system  a  flexible 
reconfigurable  environment  no  matter  whether  or  not  the  fault  set  under  consideration  is  criti¬ 
cal.  A  single  surviving  system  or  several  surviving  subsystems  are  formed  by  performing  this 
scheme  such  that  deadlocks  can  be  avoided.  This  single  surviving  system  may  be  composed 
of  the  original  N  processors  or  only  a  subset  of  them.  A  shortest-path  routing  table  for  each 
processor  is  obtained  from  which  a  processor  can  always  know  the  minimal  number  of  passes 
and  proper  intermediate  processors  to  access  other  processors  in  the  same  surviving  system 
(subsystem).  The  idea  of  the  fault-tolerant  reconfiguration  is  described  as  follows.  In  Fig. 
5.1,  imagine  that  there  is  a  machinism  in  the  host  computer,  named  as 
reconfiguration  monitor,  which  can  process  the  fault-tolerant  reconfiguration  scheme.  Once 
the  locations  of  faulty  components  have  been  known,  the  reconfiguration  monitor  then 
processes  the  following  procedures. 

(1)  Obtain  the  communication  capability  of  each  processor  for  the  first  pass  through  the 
faulty  network.  Some  processors  may  be  considered  as  unusable  due  to  the  complete 
destruction  of  their  communication  capabilities.  Conceptu.tlly  a  single  surviving  system 
will  be  produced  if  we  remove  these  unusable  processors. 

(2)  The  communication  capabilities  of  all  the  processors  is  sent  back  from  the 
reconfiguration  monitor  to  each  processor.  A  shortest-path  routing  algorithm  is  per¬ 
formed  in  each  processor  to  find  all  the  possible  shortest  routes  to  other  processors. 
Eventually,  a  shortest-path  routing  table  is  produced  for  each  processor.  By  using  this 
pigorithm,  the  proper  intermediate  processors  and  the  minimal  number  of  passes  through 
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the  faulty  network  for  a  processor  to  access  another  processor  are  obtained. 

(3)  Under  some  situations,  a  reconfiguration  algorithm  must  be  employed  to  avoid  the  impli¬ 
cit  danger  of  deadlocks.  These  situations  are  due  to  the  criticality  of  the  fault  set  with 
respect  to  the  surviving  system.  According  to  this  reconfiguration  algorithm,  the  surviv¬ 
ing  system  is  partitioned  into  several  subsystems.  Each  subsystem  possesses  the  DFA 
property.  The  partitioning  of  the  surviving  system  is  implicitly  equivalent  to  sacrificing 
some  usable  components  which  only  help  establishing  unidirectional  multi-pass  commun¬ 
ication  paths.  However,  we  do  not  have  to  know  the  actual  locations  of  these  usable 
components  during  the  panitioning. 

We  start  our  discussion  from  basic  properties  of  routing  behavior  of  Omega  networks 
under  faults. 

5. 3. A.  Routing  Behavior  of  Omega  Networks  under  Faults 

For  an  n -stage  Omega  network,  due  to  its  regular  structure,  any  paths  traversing  a 
switching  element  (i  ,e )  at  stage  i  can  be  expressed  by  the  following  two  transition  sequences. 
These  transition  sequences  also  indicate  which  network  inputs  5’s  and  outputs  D 's  are  con¬ 
nected  through  e . 

Backward : 

B  ^n-2’  •  •  • 

£‘  =  .  .  .  ,^1,  c) 

=  fc,  e„_i,  .  .  .  ,62,  ex) 


‘  =  (c,  . e2,c) 
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II 

O 

Q 

■  ■  ,  C  ,  fin-l'  • 

■  •  c ) 

S  ={c,  . 

.  .  ,c,  e^-i,  . 

■  •  ’^i+2'  ^i+l) 

Forward : 


^  ^n-2’  ■  ■ 

.  .  ,^1,  c) 

=  (^n-2>  ^n-3>  ■ 

.  ■  ,ei,c,  e„_i) 

=  (^n-2-  ^n-3-  - 

•  •  ,  ^  1 ,  ) 

£"  ^  =  (e,  ei_i,  .  .  .  ,ei,  c ,  .  .  .  ,c,  ^,>i) 
=  (e,  - ei,  c,  .  .  .  ,c,  c) 


D  =  (e;  €i_i,  .  .  .  ,ei,  c,  .  .  .  ,c.  c) 


In  these  transition  sequence,  each  ,  0  ^  j  <  n  -  represents  the  address  of  the  input 
pon  through  which  a  path  traverses  stage  j  and  each  D-' ,  0  <  y  <  n  -  1,  the  output  pon 
through  which  the  path  traverses  stage  j.  Obviously,  [n  -  \  \  \]  =  [n  -  \  :  1]  is  the 
address  of  the  switching  element  through  which  a  path  traverses  stage  j  and  each  one  of  such 
paths  passes  through  the  switching  element  e  at  stage  i  (i.e.,  when  j  -  i).  The  switching  ele¬ 
ment  e  at  stage  i  can  be  viewed  as  the  common  root  of  two  communication  binary  trees. 
One  is  the  backward  (j-i-l)-level  tree  with  the  address  label  of  its  leaves  (switching  elements 
at  input  stage)  equal  to  (c,  ...,  c,  •••.  and  the  address  label  of  net¬ 

work  inputs  connected  to  leaves  equal  to  5  =  (c,  ...,  c,  e„_i,  e„_2,  ...,  f,+i)  ,  #(c)  =  /  -t-  1. 
The  other  one  is  the  forward  (/i-i)-level  tree  with  the  address  label  of  its  leaves  (switching 
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elements  at  output  stage)  equal  to  (e,,  ej_i,  c,  c)  ,  #(c)  =  n  -  i  -  I,  and  the 

address  label  of  network  outputs  connected  to  leaves  equal  to  D  =  (c,  ,  c,_j,  Cj,  c . c), 

#(c)  =  n  -i.  Thus,  totally  network  inputs  and  2"“'  network  outputs  are  connected 
through  e.  If  e  is  faulty,  clearly,  the  communication  from  these  network  inputs  to  those 
2"“'  network  outputs  will  be  destroyed.  Particularly,  when  i  =  0  (i  =  n  -  1),  these  above 
two  binary  trees  reduce  to  one  in  which  the  root  e  is  rooted  at  input  (output)  stage  and  the 
root  e  is  connected  to  two  network  inputs  5  =  (c,  e„_i,  e„_2,  ^i)  (two  network  outputs,  D 

=  (^n-i'  ^n-2'  ^1’  all  the  network  outputs  (inputs).  It  is  obvious  that  as  the  neces¬ 

sary  condition  for  a  fault  set  F  to  be  noncritical  with  respect  to  the  original  /V -processor  sys¬ 
tem,  F  cannot  contain  any  switching  elements  from  the  input  or  output  stages;  otherwise  com¬ 
munication  trees  rooted  at  these  switching  elements  are  completely  destroyed.  If  that  happens, 
processors  connected  to  faulty  switching  elements  at  input  or  output  stages  will  no  longer  be 
used. 

It  also  can  be  observed  that  the  pair  of  switching  elements  {(/,  w),  (i,  w  -i-  2"“^)}  at 
stage  i  is  connected  to  only  one  pair  of  switching  elements  {(/  +  1,  2w),  (/  -i-  1,  2w  -i-  1)}  at 
the  next  stage,  where  0  <  w  <  N/4  -  1.  This  is  referred  to  as  the  buddy  property  in  [Agr83]. 
If  both  elements  of  a  buddy  pair  {(/,  w),  {i,  w  +  2'*“^))  are  in  a  fault  set  F ,  then  F  can  be 
expanded  to  include  elements  {(/  +  1,  2w),  (/  +  1,  2>v  +  1)}  without  affecting  any  additional 
communication  paths.  Similarly,  if  both  elements  of  {(i  +  1,  2w),  (j  -t-  1,  2w  +  1)}  are  in  F, 
then  {(i,  w),  (/,  w  -f-  2""^)}  can  be  included  in  F . 

A  similar  argument  can  be  made  about  a  communication  link  [i,  h].  The  communica¬ 
tion  link  [i,  h]  between  stage  i-\  and  stage  i,  1  <  /  <  n  -  1,  can  be  viewed  as  the  common 
root  of  two  communication  binary  trees.  One  is  the  backward  (/-)-l)-level  tree  with  the 
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address  label  of  its  leaves  (i.e.,  processors  connected  to  input  stage)  equal  to  5  =  (c,  c, 
hn-\,  hn-1'  ^{c)  -  i.  The  other  one  is  the  forward  (rt-2+l)-level  tree  with  the  address 

label  of  its  leaves  (i.e.,  processors  connected  to  output  stage)  equal  to  c ,  ..., 

c)  ,  #  (c )  =  n  -  i .  Thus,  totally  2‘  network  inputs  and  2"“‘  network  outputs  are  connected 
through  [/,  /z].  If  [z,  /i]  is  faulty,  clearly,  the  communication  from  these  2‘  network  inputs  to 
those  2""'  network  outputs  will  be  destroyed.  One  thing  which  differs  faulty  communication 
links  from  faulty  switching  elements  is  that  no  failure  of  any  single  link  will  completely  des¬ 
troy  the  communication  capability  of  any  processor,  even  if  the  faulty  link  comes  from  {[1, 
/i]}  and  {[/I  -  1,  /i]}.  However,  since  the  pair  of  links  {[z,  2e],  [z,  2e  +  1]}  is  connected  to 
the  pair  of  input  ports  of  switching  element  e  at  stage  z  and  the  pair  of  links  {[z,  e  div  2"“^ 
+  4  (e  mod  2"*^)],  [z,  e  div  2""^  -t-  4  (e  mod  2"“^)  +2]},  connected  to  the  pair  of  output 
ports  of  switching  element  e  at  stage  z-1,  0  ^  e  ^  N/2  -  1,  the  failure  of  such  a  pair  of  links 
is  equivalent  to  the  failure  of  a  switching  element.  Thus,  a  fault  set  which  contains  such  a 
pair  of  faulty  links  can  include  a  switching  element  to  which  the  pair  of  faulty  links  is  con¬ 
nected  in  order  to  form  an  equivalent  fault  set. 

For  example,  in  Fig.  5.3,  a  fault  set  F  =  {(1,1),  (1,2),  (1,5),  (2,0),  (2,5),  [1,1],  [1,3]}  is 
shown  on  a  16  X  16  Omega  network.  The  buddy  pair  of  faulty  switching  elements  ((1.1), 
(1,5)}  at  stage  1,  is  connected  to  the  buddy  pair  {(2,2),  (2,3))  at  stage  2.  Therefore,  the  fault 
set  F  can  include  {(2,2),  (2,3)}  to  form  an  equivalent  fault  set.  Similarly,  the  pair  of  faulty 
links  {[1,1],  [1,3]}  is  connected  to  the  switching  element  {(0,4)}.  Therefore,  the  fault  set  F 
can  include  {(0,4)}  as  well  to  form  anx'^ther  equivalent  fault  set. 
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5.3.B.  Communication  Capability  for  the  First  Pass  under  Faults 

After  the  faulty  components  of  a  fault  set  have  been  located,  an  identical  procedure  is 
performed  in  each  processor  to  obtain  the  communication  capability  of  the  A -processor  sys¬ 
tem  for  the  first  pass  through  the  faulty  network.  The  following  algorithm  is  a  procedure  to 
find  all  the  accessible  processors  of  processor  j,  0  <  /  <  A/  -  1,  for  the  first  pass  through  the 
faulty  network.  The  notations  and  represent  a  faulty  link  and  a  faulty  switching  ele¬ 
ment  in  a  fault  set  F ,  respectively.  Finally,  the  set  Zy  contains  all  the  accessible  processors  of 
processor  i . 

Algorithm  1: 

{  Find  all  the  accessible  processors  of  processor  i  for  the  first  pass  } 
procedure  Accessibility  (i :  processor;  F  :  a  fault  set) 

{  let  the  binary  representation  of  i  =  Un-h  in-2’  — »  ) 

k  <r-0 

Z,  <—  {0,  1 . N  -  \  ]  {  inidally,  Z;  contains  all  the  N  processors  j 

for  each  e  F  do 

mark[l^i^]  <—  True 

{  scan  all  the  faulty  components  stage  by  stage  } 
while  k  ^  n  -  \  and  Z,  <{)  do 

{  delete  those  unaccessible  processors  due  to  the  faulty  links  } 
if  k  *0  then 

for  each  —  [fc,  iin—k—2’  in—k—'i’  •••■<  iO’  ^k—2’  •••’  -^O*  ^n-t— l)]  ^  ^ 

do 

if  mark[l^[^  ]  ^  False  then 

Z,  <-  xit_2’  -’^0’  c) '  He)  =  n  ~  k} 

for  each 

Cl  -  [C  0n-<-2’  *n-/-3»  ' O'  ^k-2 . -^0’ 

Vq,  e  F, 

Csw  =  O'  iin-j-2’  in-j-3’  *0'  -^k-l’  •••'  -’^O'  y'j-k-Z- 

>’o))  e  F, 
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stage  0  12  3 


Fig.  5.3.  An  example  fault  set  F  on  a  16  x  16  Omega  network. 


r  1 
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where  n  -  \  >  I  >  k,  n  -  I  >  j  >  k  do 
{  when  i  -  k,  bity.i  is  undefined.  } 
mafk[(^i  \  <—  False 
mark[^sw  1  ^  False 

{  delete  those  unaccessible  processors  due  to  the  faulty  switching  elements 

} 

for  each  in-k-3^  'o.  . ^o))  e  do 

if  mark[(,^}f/]  *■  False  then 

Z,  4-  Z,/{(jCfc_i,  jc^_2,  ....  .Xq,  c,  ....  c)  t  #(c)  =  «  -  /: } 
for  each 

C5W  =  (^*  ('n-/-2«  ^n-/-3*  ■—  'O*  ^i-1*  ^k-1 . ^0’  yi-k-2’  ■  ’ 

yo))  € 

Cl  =  On-l-2'  4-/-3’  •••’  '0«  >'0- 

e  F, 

where  ri-l>/>^do 
mark[^^]  <—  False 
mark[l^i^]  <—  Fa/^e 
<-  /:  +  1 

Note  that  all  the  single-pass  communication  paths  starting  from  processor  i  construct  a 
binary  Algorithm  1  identifies  the  faulty  components  in  this  binary  tree  stage  by  stage 
and  deletes  all  the  unaccessible  processors  due  to  the  destruction  of  communication  by  either 
faulty  links  or  faulty  switching  elements.  Since  the  failure  of  a  component  can  destroy  the 
communication  from  processor  i  to  a  subtree  rooted  at  this  component,  any  faulty  components 
in  this  subtree  will  not  cause  the  deletion  of  any  new  processors.  Thus,  such  faulty  com¬ 
ponents  are  marked  with  a  value  False  to  indicate  that  no  deletion  operations  are  needed  to 
be  performed  when  they  are  scanned.  The  total  number  of  faulty  components  in  F  is  at  most 
N 

.V  (log2iV  -  1)  -I-  — log2^V.  To  identify  or  mark  any  one  of  them  in  the  binar\’  tree  a  log2N- 

bit  comparison  operation  is  needed.  Thus,  the  scanning  of  all  the  faulty  components  in  F 
takes  a  time  in  0(N{\ogNfy  in  the  worst  case.  Also,  Algorithm  1  needs  at  most  N  deletion 
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operations.  Therefore,  Algorithm  1  takes  a  time  in  0(N(\ogNfy  in  the  worst  case. 

Since  Z,  is  the  set  of  accessible  processors  of  processor  i,  0  <  i  <  N  -  I,  We  can  asso¬ 
ciate  each  processor  j  e  Z^  with  a  number  [C*  j  =  1  and  other  processors  k  €  {0,  1,  ...,  N 

-  1}/Z,  (i.e.,  the  set  difference  of  {0,  1 . N  -  \  ]  and  Z,),  a  number  =  -1-  Thus, 

we  have  defined  an  array  [C*]  where  [C*],  y,  /,  j  e  (0,1,... jV-1 ),  denotes  the  entry  in  array 
[C*]  at  the  intersection  of  row  i  and  column  j.  Obviously,  the  array  [C*]  represents  the 
accessibility  of  each  processor  in  the  first  pass  through  the  faulty  network,  i.e.,  processor  i 
can  communicate  with  processor  j  iff  =  1.  Note  that  each  processor  always  can  com¬ 

municate  with  itself  without  passing  through  the  network.  Even  through  the  diagonal  entries 
[C*],_,  ’s  in  [C*]  may  not  be  all  equal  to  1,  this  however  will  give  more  convenience  for  our 
presentation. 

Some  processors  may  lose  all  the  communication  paths  to  or  from  all  the  .V  processors, 
say,  due  to  faulty  switching  elements  at  input  stage  or  output  stage.  Thus,  no  possible  data 
packets  issued  by  these  dead  processors  will  arrive  at  other  processors  or  no  possible  data 
packets  will  be  received  by  these  dead  processors.  They  can  be  found  by  inspecting  array 
[C*]  such  that  iff  they  are  some  i  or  k  such  that  =  -1  or  [C*]^  ^  =  -1,  for  all  0  <  ;' 

<  A  -  1,  respectively.  These  processors  must  be  disabled  (or  conceptually  be  removed)  to 
avoid  blocking  other  communication  and  further  slowing  down  the  system.  Let  be  the  set 
of  surviving  processors  excluding  all  the  dead  processors,  referred  to  as  the  surviving  system . 
Define  the  following  two  arrays,  restriction  array  and  connection  array  to  represent  the 
communication  capability  of  the  surviving  system  5^  for  the  first  pass  through  the  faulty  net¬ 


work. 
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DEFINITION  5:  Array  [C;?],  referred  to  as  the  restriction  array,  is  defined  as  [C^],  y  = 
[C*]^  j,  for  all  i,  j  e  S^.  Array  [Q],  referred  to  as  the  connection  array,  is  defined  as 
[C5  ,  =  1  and  [Q  ], ^  =  [C*  ],^ ,  for  all  L,  j  e  and  /  y .  □ 

It  is  clear  that  [C^  ]  represents  the  communication  capability  of  the  surviving  system 
without  considering  the  self-communication  capability  of  each  surviving  processor,  on  the 
other  hand,  [C^]  include  the  self-communication  capability  of  each  surviving  processor. 

5.3.C.  Construction  of  the  Surviving  System 

The  surviving  system  with  respect  to  a  fault  set  can  be  constructed  if  we  knew  the 
corresponding  maximal  fault  set.  The  principle  for  constructing  the  surviving  system  is  that: 
find  the  maximal  fault  set  corresponding  to  a  fault  set  and  then  remove  all  the  components  in 
the  maximal  fault  set  and  those  isolated  processors.  Thus,  the  remaining  substructure  is  the 
surviving  system  interconnected  by  the  surviving  network  through  which  the  communication 
between  the  surviving  processors  can  be  maintained.  The  isolated  processors  without  any 
possible  incoming  or  outgoing  paths  correspond  to  those  dead  processors  which  cannot  receive 
data  packets  or  whose  data  packets  cannot  arrive  at  other  processors,  respectively.  The  algo¬ 
rithm  to  construct  the  maximal  fault  set  F corresponding  to  a  given  fault  set  F  can  be 
found  in  [VaRa89]  by  Varma  and  Raghavendra  where  only  faulty  switching  elements  are  con¬ 
sidered.  To  fit  into  our  more  general  fault  model  where  both  faulty  links  and  switching  ele¬ 
ments  are  considered,  we  generalize  their  algorithm  as  follows. 

Algorithm  2: 

{  Construct  the  maximal  fault  set  F^^  } 
procedure  Maximal-Fault-Set  (F  :  a  fault  set) 

^max  ^  ^  {  initially,  F^^=F  } 
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^dd  ^  4>  {  initially,  the  set  for  new  additional  switching  elements  is  empty  } 

{  include  those  links  connected  to  switching  elements  in  } 
for  each  (/,  e)  e  f  do 

^max  ^max  kJ  [/.  2e  +  1],  [/ ,  €  div  2""^  +  4  (e  mod  2'’“^)], 

[i ,  e  div  2"~^  +  4  (e  mod  2"“^)  +  2]) 

{  include  those  switching  elements  if  the  pair  of  links  connected  to  their 
input  ports  or  output  pons  are  in  1 
for  i  =  1  to  n  -  1  do 
for  e  =0  to  N  H  -  1  do 

if  {[/,  le],  [i,  2e  +  1]}  c  then 
^  ^dd  kJ  { (i .  ^  )  ) 

^  max  ^  ^  max  <  ^  )  } 

if  {[/,  e  div  2"-^  +  4  (e  mod  2"“^)],  [/,  e  div  2"“^  +  4  (e  mod  2"'-) 

+  2]} 

^  ^max  then 

^dd  ^  ^dd  kJ  ~  i.  ^)} 

^  max  ^  ^  max  k>^  ~  ^  ^  ^  J 

{  include  buddy  pairs  of  switching  elements  ) 

(  forward  pass  ) 
for  /  =  1  to  n  -  1  do 

for  w  =0  to  A^/4  -  1  do 

if  {(/,  w),{i,w  +  2''”^)}  C  F^ax  then 

^dd  ^  ^dd  KJ  (O’  +  i’  2w),  (i  +  1,  2w  +  D) 

^max  ^max  kJ  ^w),  (/  +  1,  2w  +  1)} 

(  reverse  pass  } 
for  /  =  n  -  I  downto  1  do 
for  w  =0  to  N/4  -  1  do 

if  {(i,  2w),  (j,  2w  +  1)}  c  Fn,ax  then 

^dd  ^dd  KJ  (O'  -  1.  >v),  (/  -  1,  w  +  2"~^)} 

^max  <-  ^max  kj  (O’  “  i’  O’  “  0  W  +  2""^)} 

(  include  links  connected  to  switching  elements  in  } 
for  each  (i,  e)  e  do 

^max  ^max  kJ  O'.  2e],  [/ ,  2e  +1],  [i ,  e  div  T~^  +  A-{e  mod 
[/,  e  div  2”“^  +  4  (e  mod  2”"^)  +  2]} 
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The  maximal  fault  set  is  constructed  starting  from  =  F  by 

(1)  adding  those  links  connected  to  switching  elements  in 

(2)  adding  those  switching  elements  if  the  pair  of  links  connected  to  their  input  ports  or  out¬ 
put  ports  is  in  F^^; 

(3)  scanning  the  network  first  from  the  input  side  to  the  output  side  and  then  in  the  reverse 
order;  adding  the  buddy  pairs  of  all  the  pairs  already  in  F^^^,  until  no  more  additions  are 
possible, 

(4)  including  those  links  connected  to  new  additional  switching  elements  in  to  F^^x- 

It  can  be  proved  [VaRa89]  that  one  forward  pass  and  one  backward  pass  are  sufficient  to 
include  all  the  buddy  pairs  deduced  from  F  and  thus  obtain  the  corresponding  Fjj,^.  By 
removing  all  the  components  in  F^^^  and  those  isolated  processors,  we  can  obtain  the  surviv¬ 
ing  system  S^^r.  As  the  study  in  [VaRa89]  have  shown,  for  N  <  16  cases,  a  fault  set  F  is 
critical  with  respect  to  the  original  -processor  system  iff  its  corresponding  maximal  fault  set 
^  max  contains  switching  elements  from  input  or  output  stages  of  the  network.  However,  to 
determine  the  criticality  of  a  fault  set  F  for  cases  where  N  >  16,  no  general  necessary  and 
sufficient  conditions  based  on  the  distribution  of  faulty  switching  elements  have  been  found 
yet.  The  condition  that  the  corresponding  maximal  fault  set  F^^^  contains  no  switching  ele¬ 
ments  from  input  or  output  stages  of  the  network  is  only  necessary  for  the  non-criticality  of  a 
fault  set  F  with  respect  to  the  original  A' -processor  system.  More  precisely,  we  can  show  that 
for  cases  where  A  >  16,  even  through  some  processors  are  removed  due  to  switching  ele¬ 
ments  from  input  or  output  stages  in  F^,^,  the  non-criticality  of  a  fault  set  F  with  respect  to 
the  surviving  system  5^  still  cannot  be  determined.  Nevertheless,  we  will  show  in  the  fol- 
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lowing  sections  that  to  determine  the  DFA  property  of  the  surviving  system  is  not  so  pes¬ 
simistic  as  it  looks  like.  Actually,  the  information  of  restriction  array  [C^]  or  connection 
array  [C5  ]  will  be  sufficient  for  determining  the  DFA  property  and  developing  an  efficient 
fault-tolerant  routing  scheme  for  the  surviving  system  and  the  construction  of  the  maximal 
fault  set  will  not  be  necessary. 

For  example,  a  32  x  32  Omega  network  with  a  fault  set  F  is  shown  in  Fig.  5.4(a).  For 
easily  constructing  the  surviving  system,  we  show  an  alternative  drawing  of  the  Omega  net¬ 
work  as  in  Fig.  5.4('b)  which  is  a  butterfly  structure.  The  switching  elements  {(1,4),  fl.l2)} 
and  1(1,12),  (1,13)}  at  stage  1  are  faulty  buddy  pairs.  As  we  perform  Algorithm  2  and  scan 
the  network  forward  and  backward,  two  switching  elements  ((0,3),  (4,5)}  are  included  in 
due  to  that  those  links  connected  to  them  are  in  Fj^^.  And,  four  other  pairs  of  switch¬ 
ing  elements  are  included  in  Fjj^^  due  to  faulty  buddy  pairs,  i.e.,  switching  elements  {(0,6), 
(0,14),  (2,8),  (2,9),  (3,2),  (3,3),  (4,6),  (4,7)}  and  those  communication  links  connected  to  them 
are  included  in  F^^.  By  removing  all  the  components  in  the  maximal  fault  set  from  the  net¬ 
work,  processors  {3,  6,  7,  10,  11,  12,  13,  14,  15,  19,  22,  30,  31}  are  isolated  from  the  original 
system.  The  surviving  system  {0,  1,  2,  4,  5,  8,  9,  16,  17,  18,  20,  21,  23,  24,  25,  26,  27,  28, 
29}  is  shown  in  Fig.  5.5. 

Based  on  the  construction  of  the  surviving  system  S^,  an  interesting  property  of  the  res¬ 
triction  array  [C/j]  is  derived  as  described  in  Theorem  1.  Theorem  1  states  that  the  data  from 
a  number  of  rows  of  [C/j]  alone  will  be  sufficient  to  represent  [C^j].  We  will  show  in  the 
next  section  that  this  property  can  save  a  lot  of  computational  efforts  for  a  processor  to  find 
all  the  shortest  routes  to  other  processors. 
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THEOREM  1:  Let  [C/f],  ,  i  e  S^r,  denote  ith  row  of  [C/j]  which  corresponds  to  proces¬ 
sor  i  in  If  both  j  and  j  +  N/2  are  two  processors  in  5^^,  0  <  j  <  N /2  -  1,  then  [Q]^ 

[Q];+.v/2  are  identical,  i.e.,  for  alU  e  5^. 

Proof;  Different  situations  are  considered  with  respect  to  switching  elements  and  links 

^max" 

Processor  pairs  {k,  k  +  N/2}  or  {2k,  2k  -i-l),  0  <  k  <  N/2  -  1,  which  are  connected  to 
switching  elements  in  F  from  input  or  output  stages  respectively,  are  removed  since  they 
do  not  belong  to  S.^^-  Thus,  only  switching  elements  from  stage  1  to  stage  n-2  in 
affect  the  communication  between  processors  in  and  need  to  be  considered.  As  we  have 
mentioned,  any  switching  element  at  stage  i ,  1  <  i  <  n  -  2,  is  a  common  root  of  two  com¬ 
munication  trees  and  2‘‘^'  source  processors  arc  connected  to  2'*'*  destination  processors 
through  this  switching  element  (for  a  single  pass  through  the  network).  If  this  switching  ele¬ 
ment  is  in  then  the  outgoing  paths  from  these  2'"^^  source  processors  to  those  2""‘  des¬ 

tination  processors  will  be  destroyed.  Since  t  >  I,  at  least  four  source  processo’^s  are  affected 
by  a  switching  element  in  f  max  stage  z,  1  <  /  <  n  -  2.  That  is,  any  switching  element  e  = 
e^_2’  •••’  ^i)  stage  i  in  F^ax  will  destroy  outgoing  paths  to  the  set  of  destination  pro¬ 
cessors  {(c,,  c,_i,  ...,  €2,  Cl,  c,  ...,  c)},  U(c)  =  n  -  /  ,  of  2*'^^  source  processors  with  the  com¬ 
mon  address  label  equal  to  (c,  ....  c,  e„_i,  ...,  e,+i),  where  2‘‘*'^  >  4  or  #(c)  =  i  -i-  1  >  2. 

These  2'"^^  source  processors  can  always  be  partitioned  into  four  groups  such  that  in  each 
group  the  first  two  bits  of  addresses  of  all  the  processors  are  the  same  and  each  processors  has 
the  relative  address  label  equal  to  (c,  ...,  c,  e„_i,  e„_2,  ...,  #(c)  =  /  -  1.  Note  that  each 

relative  address  in  each  of  the  four  groups  has  n-2  bits.  Each  group  is  a  subset  of  one  of 
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Fig.  5.4(a).  A  32  X  32  Omega  network  with  a  fault  set 


140 


Fig.  5.4(b).  An  alternative  drawing  of  the  Omega  network  with  the  same  fault  set. 
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Fig.  5.5.  The  surviving  substructure  of  the  system  in  Fig.  5.4  after 
removing  ail  the  components  in  the  maximaJ  fault  set. 
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the  four  subsets  of  processors  =  [x-N lA  +  y  I  0  <  y  <  N/4  -  1},  0  <  x  <3.  That  is,  {(0, 
0,  c,  c,  e„_i,  e„_2,  e;+i)}  c  xj/o  =  {(0,  0,  y^_3,  y„_4,  yo)},  1(0,  1,  c,  c,  e„_i, 

en~2 . ^  Vi  =  {(0,  1,  y„_3,  y^_4,  ....  yi,  y,))},  {(1,  0,  c,  c,  e„_i,  e„_2,  e,>i)}  c 

x|/2  =  {(1,  0,  y„_3,  y^_4,  yi,  yo)},  and  {(1,  1,  c,  ....  c,  e^_j,  e„_2’  ^/+i)}  ^  V3  =  ((1- 

y„-3,  y„_4,  yi,  vq)}.  Note  that  not  all  the  2*'^^  processors  {(c,  c,  e„_2,  e,+i)  I 

#(c )  =  j  +  1  >  2 1  may  exist  in  5^  since  some  of  them  may  be  removed  due  to  the  switching 

elements  in  ^om  input  or  output  stages.  Therefore,  we  are  ready  to  make  the  following 

conclusions.  Any  switching  element  in  F  will  destroy  the  communication  from  four 
source  processors  with  the  same  relative  address  in  each  of  the  four  subsets  to  the  same  set  of 
destination  processors.  Of  cause,  we  assume  here  that  two  or  more  of  these  four  source  pro¬ 
cessors  exist  in  ,  i.e.,  not  all  and  at  most  two  of  them  are  removed  due  to  the  switching 

* 

elements  in  from  input  or  output  stages,  otherwise  it  becomes  a  trivial  case.  Thus,  the 

combined  effect  of  all  the  switching  elements  in  on  these  processors  in  5^^  with  the 

same  relative  address  in  each  of  the  four  subsets  is  that  the  resulting  communication  capability 
of  these  processors  for  a  single  pass  through  the  network  is  identical.  That  is,  if  F^^  con¬ 
tains  only  switching  elements  and  if  U  =  [k  +  l-NIA  1  0  ^  /  <  3}  o  *  (fi,  0  <  k  <  N /A  - 
1,  then  for  all  .x  e  U ,  arc  identical.  It  is  also  true  for  the  statement  that  if  F^^  con¬ 

tains  only  switching  elements  and  both  processors  j  and  j  -i-  lV/2  are  in  5^,  0  <  j  <  N 12  -  1, 
then  [C/j]y  and  [C/?]y+^/2  arc  identical. 

Similarly,  for  any  communication  link  in  F^^,  the  communication  capability  of  at  least 
two  processors  are  affected.  All  the  affected  processors  can  be  panitioned  into  two  groups. 
Each  group  corresponds  to  the  same  relative  addresses  in  one  of  the  two  subsets  of  processors 
X|/;t  =  {x-N/2  +  y  I  0  <  y  <  N /2  -  1},  0  <.r  <  1.  Therefore,  if  F^^  contains  only  links  and 
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both  processors  j  and  j  +  N  H  ^  <  j  <  N/2  -  1,  then  and  [C/j]y+yv/2  ^  identi¬ 
cal.  □ 

For  example,  in  Table  5.1,  the  restriction  array  [C/j]  and  the  connection  array  [C5]  are 
shown  for  the  surviving  system  in  Fig.  5.5,  where  each  entry  or  represent  1  and  else¬ 
where,  -1.  The  results  can  be  checked  directly  from  Fig.  5.5.  Moreover,  we  have  [C/jJq  = 
[Q]i6’  [Qli  =  [Q]i7.  \Cr\i  =  [QI18.  and  so  on.  The  correctness  of  Theorem  1  is  obvious. 
5.3.D.  Construction  of  Shortest- Path  Routing  Tables 

After  the  restriction  array  [Cr]  or  connection  array  [C5]  of  are  obtained,  it  is 
straightforward  to  model  the  multipass  routing  problem  on  by  using  a  simple  directed 
multigraph.  Obviously,  to  know  whether  there  exist  communication  paths  between  any  pair  of 
processors  of  by  going  multiple  passes  through  the  faulty  network  is  equivalent  to  deter¬ 
mine  the  reachability  between  these  two  vertices  on  a  simple  directed  multigraph.  In  order  to 
reconfigure  the  surviving  system  in  a  most  efficient  way,  an  appropriate  fault-tolerant  routing 
scheme  between  the  surviving  processors  need  to  be  developed,  which  is  our  major  concern  in 
this  section.  We  show  that  a  breadth— first -search  algorithm  [PrYe73]  can  be  used  to  find 
shortest  multi-pass  communication  paths  between  any  pair  of  processors  in  the  surviving  sys¬ 
tem.  Hereafter,  the  two  terms,  communication  paths  and  paths,  will  be  used  interchangeably 
to  represent  routing  paths  by  one  or  more  passes  through  the  faulty  network  unless  otherwise 
specified. 

Imagine  that  we  have  a  set  V  with  #(5^ )  vertices  which  are  indexed  on  the  set  .  For 
each  vertex  v-  &  V,  there  is  a  corresponding  processor  i,  i  e  A  #(Sj^')-verTex  directed 
multigraph,  G,  can  be  constructed  as  follows:  there  is  an  arc  (v,~v^  )  from  v,  to  v,  iff  [C5],  ^ 
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=  1,  where  v,  ,  Vj  e  V .  In  other  words,  the  arc  (v^.v^  exists  iff  processor  i  can  access  pro¬ 
cessor  j  by  the  first  pass  through  the  faulty  network.  Of  course,  the  loop  (v,  ,v,  )  always  exists 

on  each  venex  V; .  A  simple  directed  multigraph  is  a  graph  such  that  for  any  two  vertices,  v, 
and  Vj ,  there  exists  at  most  one  arc  from  either  Vj  to  vj  or  vj  to  v, .  Because  of  the  unique- 
path  property  of  the  Omega  network,  it  is  very  easy  to  show  that  the  graph  G  is  simple. 
Also,  if  we  assume  that  all  the  single-pass  communication  paths  of  the  network  between  any 
pair  of  network  input  and  output  of  the  network  are  equally  important,  G  will  be  an  equally 
weighted  graph  with  the  same  weight  on  all  its  arcs.  Therefore,  we  have  modeled  the  surviv¬ 
ing  system  by  a  simple  directed  multigraph  G  which  is  equally  weighted. 

Define  a  new  array  [G5*]  such  that  for  all  i,  j  e  5^^,  if  [Cs]i  j  =  -1  then  [C^],  ^  =  0; 

else  [C5*],  y  =  1-  A  vertex  Vj  is  said  to  be  reachable  from  another  vertex  v,-  iff  there  is  a  path 

* 

from  V,-  to  Vj  or  [Csltj  ^  0,  for  some  ^  >  1.  (here  [C/]*  represents  the  *th  power  of  [C5]) 
The  order  of  a  path  (the  number  of  arcs  on  the  path)  connecting  v,-  to  Vy  represents  the 
number  of  passes  needed  through  the  faulty  network  for  processor  t  to  access  processor  j . 
All  the  intermediate  venices  on  the  path  represent  those  intermediate  processors  which  need  to 
be  traversed.  It  can  be  proved  [HoSa78]  that  if  vertex  Vj  is  reachable  from  vertex  v,  on  G , 

then  [Cs]tj  0  in  at  least  one  k  where  \  %  k  <  #(5^)  -  1.  The  graph  G  is  said  to  be 

strongly  connected  if  for  each  pair  of  vertices  v,  and  Vj ,  there  exists  at  least  one  path  from  V; 
to  Vj  and  one  path  from  Vj  to  v,- .  Thus,  G  is  strongly  connected  iff  the  surviving  system 
has  the  DFA  property.  Generally  speaking,  to  determine  the  DFA  property  of  we  need  to 

traverse  the  faulty  network  at  most  #(5^)  -  1  passes  and  check  each  entry  of  each  array 

[C^]*,  i.e.,  determine  the  reachability  between  any  pair  of  processors.  A  similar  method  is 
used  in  [AgLe85]  to  understand  the  DFA  property  of  multiprocessor  interconnected  by  a 
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Ikbie  5.1.  The  resmaion  array  [c>,]  and  ihe  connection  array  [Q]  for  the  surviving  system  m  Fig.  5.5 
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faulty  network.  However,  the  computational  complexity  is  prohibitively  high  as  the  size  of 
system  increases. 

As  long  as  the  shonest  communication  paths  are  used  for  communication  between  pro¬ 
cessors  in  the  surviving  system,  an  appropriate  shonest-path  routing  table  for  a  processor  to 
access  other  processors  must  be  supplied.  In  order  to  access  a  destination  processor,  the 
corresponding  entry  of  the  shortest-path  roubng  table  for  a  source  processor  must  include  the 
following  information; 

(1)  the  proper  intermediate  processors  through  which  its  data  packets  will  be  routed  in  the 
first  pass  through  the  faulty  network, 

(2)  the  minimum  number  of  passes  through  the  faulty  network  to  arrive  at  the  destination 
processor. 

Therefore,  the  shortest-path  fault- tolerant  routing  scheme  on  the  surviving  system  is  described 
as  follows.  Whenever  a  data  packet  arrives  at  an  intermediate  processor  after  a  pass  through 
the  network,  the  control  ponion  of  this  data  packet  contains 

(1 )  the  source  and  destination  addresses  (which  will  not  be  changed  dunng  communication). 

(2)  the  number  of  passes  left  to  reach  the  destination  processor, 

(3)  the  address  of  next  intermediate  processor  (there  may  exist  many  possible  ones)  for  the 
next  pass  through  the  network;  this  address  is  appended  after  the  entry  corresponding  to 
the  destination  processor  of  the  shortest-path  routing  table  of  the  current  intermediate 
processor  has  been  referred  to. 

The  address  of  next  intermediate  processor  is  used  as  the  temporary  routing  tag  for  the  next 
pass  through  the  network  if  it  is  not  equal  to  the  destination  address.  The  number  of  passes 
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left  to  reach  the  destination  processor  will  be  subtracted  by  one  after  the  next  pass  through  the 
network  and  compared  with  that  in  the  shortest-path  routing  table  of  next  intermediate  proces¬ 
sor  for  advanced  fault-tolerant  control.  Thus,  the  information  of  the  intermediate  processors 
for  the  first  pass  given  in  each  entry  of  a  shortest-path  routing  table  will  be  sufficient  to  sup- 
pon  this  kind  of  fault-tolerant  routing  scheme.  It  is  clear  that  the  bit-oriented  routing  scheme 
of  the  original  system  (i.e.,  the  Omega  network)  has  been  preserved  in  the  surviving  system. 

By  Algorithm  1  and  some  data  manipulation,  the  connection  array  [C5]  is  obtained.  The 
data  of  [C5]  is  then  broadcast  from  the  reconfiguration  unit  to  each  processor  in  where  a 
breadth-first-search  algorithm  is  performed  to  find  the  shortest-path  routing  table  for  each  pro¬ 
cessor  itself.  Thus,  the  advantage  is  that  an  identical  breadth-first-search  procedure  using  an 

identical  set  of  input  data  is  executed  in  parallel  in  each  processor  of  .  To  implement  the 

♦ 

breadth- first-search  algorithm,  we  need  a  type  of  data  structure  queue  that  allows  two  opera¬ 
tions  enqueue  and  dequeue.  This  type  represents  a  list  of  elements  that  are  to  be  handled  in 
a  first-come-first-serve  manner.  The  function  of  first (Q)  denotes  the  element  at  the  front  of 
the  queue  Q .  According  to  the  shonest-path  routing  scheme,  for  a  processor  i  following  the 
shortest  paths  to  access  the  destination  processor  j,  the  information  of  proper  intermediate 
processors  of  the  first  pass  through  the  faulty  network  will  be  sufficient.  Thus,  to  access  pro¬ 
cessor  j,  a  set  intermediate  [j]  is  used  to  include  all  the  possible  intermediate  processors  of 
the  first  pass  and  passes[j]  is  used  to  indicate  the  minimum  number  of  passes  required 
through  the  faulty  network.  Initially,  intermediate  [J]  is  an  empty  set  and  passes\j]  =  1.  for 
all  j  e  S^.  An  index  set  /  is  used  to  contain  all  the  intermediate  processors  which  processor 
i  can  reach  in  the  first  pass  through  the  faulty  network.  Eventually,  the  shonest-path  routing 
table  of  processor  i  is  given  by  ith  row  of  an  array  [A5  ],  i.e.,  ^  = 
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{intermediate  [j],  passes  [j]) I  =  (a,  j,  Pi  j)  gives  the  routing  information  to  access  processor 
j .  Thus,  if  intermediate [j]  <I)  in  then  this  means  that  processor  i  can  access  pro¬ 

cessor  j  by  routing  its  data  packets  through  any  one  of  the  intermediate  processor  in 
intermediate  [j  ]  in  the  first  pass  and  that  its  data  packets  will  arrive  at  processor  j  in  the 
minimum  passes  [J]  passes  through  the  faulty  network.  However,  if  intermediate  [J  ]  =  0  for 
at  least  one  j ,  then  this  means  that  processor  i  cannot  access  all  the  processors  in  .  Algo¬ 
rithm  3  gives  the  procedure  to  construct  the  shortest-path  routing  table  for  a  processor  i  in  the 
surviving  system  . 

Algorithm  y 

{  Construct  the  shortest-path  routing  table  for  each  processor  in  ] 
procedure  Breadth-First- Search  (/  e  S^:  processor,  [Q]:  array) 

2  0  {  empty  queue  } 

/  <—  0  {  empty  set  of  intermediate  processors  } 

for  each  j  e  such  that  [C5],  j  ^  -1  do 
I  UU) 

intermediate  [J]  <—  intermediate  [j]  [j] 

enqueue  j  into  Q 
{  loop  to  find  shonest  paths  ) 
if  /  ^  then 

while  Q  ^  ^  do 
j  first {Q) 
dequeue  j  from  Q 
for  each  k  such  that  ^  ^  -1  do 

if  {passes[k]  =  1  or  pasS'  s[k]  -  1  =  passes[j])  and  k  e  S^/{I} 

then 

intermediate  [k  J  <—  intermediate  [j  ]  u  intermediate  [it  ] 
passes  [/:  ]  <—  passes  [y  ]  +  1 
enqueue  k  into  Q 
(  the  shonest-path  routing  table  } 
for  each  j  s  do 
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^  {intermediate  [j],  passes  [j]) I 

The  correctness  of  Algorithm  3  is  discussed  as  follows. 

Algorithm  3  is  essentially  a  procedure  applying  a  breadth-first-search  on  the  multigraph 
G  as  mentioned  above.  Hence,  to  obtain  the  minimal  number  of  passes  through  the  faulty 
network  for  processor  i  to  access  other  processors,  we  simplely  traverse  the  multigraph  G 
starting  from  v,-  (processor  i )  using  a  breadth-first-search.  That  is  to  say,  we  start  from  v,  and 
visit  all  the  sons  of  v,  ,  then  visit  all  the  grandsons  of  v^  ,  and  so  on.  The  visiting  continues 
until  all  the  visitable  vertices  have  been  visited.  It  can  be  shown  that  for  to  be  visited,  the 
paths  from  v,  via  its  parents  are  shonest.  Thus,  the  first  step  of  shonest  paths  from  v,  to  Vj  is 
the  same  as  the  first  step  of  shonest  paths  from  v,-  to  parents  of  Vy.  Therefore,  the  shortest 
paths  from  (processors  i)  to  all  the  visitable  vertices  (processors)  are  traversed.  Moreover, 
the  breadth-first- search  searches  all  the  possible  descendent  vertices  (processors)  from  vertex 
i .  If  at  a  level  of  search  no  new  son  vertices  are  visited,  it  means  that  a  subset  of  previously 
visited  vertices  will  be  visited  again.  Since  all  the  son  vertices  of  these  previously  visited 
vertices  have  been  extensively  searched,  no  possible  paths  exist  from  these  visited  vertices  to 
other  new  vertices.  That  is  to  say,  the  search  is  terminated  if  at  a  level  of  search  no  new  son 
vertices  are  visited  (all  the  searching  branches  become  loops).  The  above  argument  gives  the 
proof  of  the  fact:  there  are  no  paths  from  vertex  i  to  vertex  j  iff  there  are  no  shortest  paths 
from  vertex  i  to  vertex  j  by  Algorithm  3.  Assume  that  the  total  number  of  arcs  traversed  is 
E  when  .Algorithm  3  is  implemented  starting  from  a  vertex  on  G .  It  is  easy  to  show  that  the 
time  complexity  of  Algorithm  3  (which  is  a  breadth-first-Search  algorithm)  is  0(£  -i-  N)  = 
O  {maxiE  JS)).  We  will  prove  that  O  (max(E  J^))  =  0{N^)  in  Section  5.4  where  the  overall 
time  complexity  of  our  fault-tole.ruit  reconfiguration  scheme  is  discussed. 
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According  to  Theorem  1,  a  lot  of  computational  efforts  can  be  saved  by  using  the  restric¬ 
tion  array  [C^j]  instead  of  the  connection  array  [C5].  It  is  clear  that  in  the  worst  case  where 
the  size  of  is  N,  the  first  half  of  rows  of  [C/j],  i.e.,  {[C/j],-  I  0  <  /  <  N/2  -  1},  will  pro¬ 
vide  enough  information  to  find  all  the  shortest-path  routing  tables.  Moreover,  at  most  N/2 
processors  will  be  sufficient  for  computing  all  the  N  shortest-path  routing  tables. 

For  example,  in  Table  5.2,  the  shortest-path  routing  tables  associated  with  the  faulty 
Omega  network  in  Fig.  5.5  are  shown.  For  simplicity,  only  the  parameter  P,  ^  (the  minimal 
number  of  passes)  of  an  entry  [^5]^^  is  shown  in  Table  5.2. 

5.3.E.  Reconfiguration  of  The  Surviving  System 

By  Algorithm  3,  each  processor  in  the  surviving  system  obtains  its  shortest-path  routing 
table.  In  some  situations,  some  processors  may  only  be  able  to  access  a  part  of  processors  in 
the  surviving  system.  For  example,  in  Fig.  5.5,  processor  1  cannot  access  processors  (4,  5, 
20,  21,  23,  28,  29}.  Obviously,  such  situations  are  due  to  the  criticality  of  the  fault  set  with 
respect  to  the  surviving  system.  An  implicit  deadlock  has  arisen  under  such  circumstances 
and  shortest-path  routing  tables  become  traps  if  they  are  not  used  cautiously.  Let  us  consider 
the  following  case  on  the  surviving  system  in  Fig.  5.5.  Processor  5  can  dynamically  fully 
access  all  the  processors  in  It  is  clear  that  if  data  packets  from  processor  5  are  sent  to 
processor  1,  no  possible  acknowledge  signals  will  be  received  by  processor  5.  That  is  to  say, 
there  will  be  a  deadlock  between  processor  1  and  processor  5.  Such  a  situation  must  be 
avoided.  Therefore,  from  the  viewpoint  of  a  reliable  reconfigurable  environment,  a  deadlock- 
free  reconfiguration  algorithm  must  be  employed  so  that  only  the  bidirectional  communication 
is  maintained  on  the  surviving  system.  Central  to  the  design  of  such  a  deadlock-free 


151 


Tkble  5.2.  The  simplified  shortest-paih  routing  tables  of  the  surviving  system  m  Fig.  5.5. 
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procedure  is  the  sacrifice  of  some  usable  links  or  switching  elements  and  the  partitioning  of 
the  surviving  system  into  a  number  of  subsystems  such  that  each  of  which  possesses  the  DFA 
propertv  (i.e.,  the  fault  set  is  noncritical  with  respect  to  each  subsystem).  In  order  to  utilize 
the  surviving  system  in  a  most  efficient  way,  a  subsystem  should  be  a  maximal  disjoint  set 
which  only  includes  all  the  possible  processors  with  bidirectional  communication  capability 
among  them.  Such  a  deadlock -free  reconfiguration  algorithm  is  our  goal  in  this  section. 

The  reconfiguration  monitor  is  notified  with  the  criticality  of  the  fault  set  whenever  a 
surviving  processor  finds  itself  cannot  access  some  processors  in  the  surviving  system,  i.e.,  for 
some  i,  j  e  =  0.  All  the  shonesi-path  routing  tables  are  collected  by  the 

reconfiguration  monitor  where  array  [A^]  will  be  inspected.  According  to  the  following 
definition,  the  reachability  array  [H^]  corresponding  to  is  obtained. 

DEFINITION  6:  [Hs]  is  a  #(5^^)  x  #(S^)  boolean  array  associated  with  [A5]  such  that 
for  i,  j  e  5^,  entry  [H^]^  j  is  a  boolean  constant  which  is  either  True  or  False.  j  is 
defined  as  follows:  [//5](,y  =  False  if  ttj  y  =  0;  otherwise  [//s],.;  =  True.  □ 

Each  =  True  indicates  that  there  exists  at  least  one  path  from  processor  i  to  pro¬ 

cessor  J.  To  avoid  the  problem  of  deadlock,  the  use  of  unidirectional  communication  paths 
between  two  processors  must  be  prohibited.  For  example,  if  [//s],.)  =  True  but  = 

False,  then  paths  from  processor  i  to  processor  j  should  not  be  used  again.  A  new  array 
[7/5]  is  employed  to  monitor  the  status  of  bidirectional  communication  between  any  tw'o  pro¬ 
cessors  in  ■  Array  [///]  is  defined  as  follows. 

DEFINITION  7:  [///]  is  a  #(5^;.)  x  boolean  array  associated  with  array  [Z/^]  such 

that  for  i,  j  e  entry  [Z/sl,,y  =  [^5];.,  =  i^sh.j  A  [Zfs];.,,  where  A  is  a  boolean  .AND 
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operation.  □ 

Thus,  array  [Z/^]  is  a  symmetric  array  with  diagonal  elements  equal  to  True.  For  any 
i,  j  e  =  True  iff  there  exist  bidirectional  communication  paths  between  proces¬ 

sor  i  and  processor  j .  It  is  obvious  that  array  [//^  ]  consists  of  a  number  of  "True  ”  blocks  m 
which  all  the  entries  are  equal  to  True .  The  method  to  construct  all  the  possible  maximal  dis¬ 
joint  sets  in  is  based  on  the  following  argument:  a  processor  i  belongs  to  a  disjoint  set  Cj 
which  must  be  maximal  iff  [Hs]i_ic  =  True,  for  &U.  k  e  Cj,  and  =  False,  for  all  /  e 

S^^lCj.  Thus,  the  subsystem  Cj  is  the  maximal  disjoint  set  which  contains  processor  i  and 
possesses  the  DFA  propeny,  i.e.,  the  fault  set  is  noncritical  with  respect  to  Cj.  This  argument 
is  formally  described  by  the  following  theorem. 

THEOREM  2:  Assume  that  there  exists  an  i  e  and  7  c  such  that  [Z/5*],  ^  = 
True ,  for  all  A:  €  T ,  and  [Z/5*|,-  /  =  False ,  for  all  I  €  S^/T.  Theu,  for  any  .r,  y  e  T  and  j 
€  S^^/T,  [Tlsh.y  -  Frue  and  [Hs]x,2  -  False.  Moreover,  T  is  the  maximal  disjoint  set 
which  contains  processor  i  and  possesses  the  DFA  property. 

Proof:  Since  [ZZ5]  is  a  symmetric  array  with  diagonal  elements  equal  to  True,  i  e  T  is 
always  true.  If  [ZZ/],^  =  True,  for  all  ^  6  7,  and  [ZZ5],-^  =  False,  for  all  /  e  S^/T,  then 
IFlslk.i  =  Frue  and  [ZZ/]; ,  =  False .  Thus,  for  any  x,y  g  7,  there  exist  communication  paths 
both  from  .x  to  y  and  from  y  to  x .  That  is,  by  Algorithm  2,  shortest  paths  either  from  x  to  y 
or  y  to  -x  have  been  found.  Therefore,  [ZZ5]^  ^  =  True,  for  all  .x,y  g  7.  However,  if  there 
exists  a  z  G  S^^/7  such  that  [ZZs];r  ^  =  True,  for  some  x  g  Tl{i },  this  means  that  there  exist 
communication  paths  from  z  via  .x  to  i,  i.e.,  [ZZ^],^,  =  True  which  contradicts  our  assump¬ 
tion.  Therefore,  [ZZ^’j^  ^  =  False,  for  all  z  g  5^/T  and  all  .x  g  7.  Moreover,  elements  of  7 
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are  all  the  possible  processors  possessing  bidirectional  communication  capability  with  proces¬ 
sor  i .  This  is  based  on  the  fact  that  array  [Z/^]  is  a  modified  array  derived  from  the  reacha¬ 
bility  array  [Z/sl-  Therefore,  T  is  the  maximal  disjoint  set  which  contains  processor  i  and 
possesses  the  DFA  property.  □ 

By  Theorem  2,  the  maximal  disjoint  set  Cj  to  which  processor  i  belongs  is  composed  of 
those  processors  k's  such  that  -  True.  Thus,  the  shortest-path  routing  table  of  pro¬ 

cessor  i  can  become  a  deadlock-free  one  by  modifying  [A5],  as  follows. 

{  modify  shortest-path  routing  tables  to  deadlock-free  ones  ) 
procedure  Update  (z  6  processor,  [H^]:  boolean  array) 
for  each  j  e  such  that  =  False  do 

^  0)i 

For  example,  the  updated  routing  tables  [^45]  for  the  surviving  system  in  Fig.  5.5  is 
shown  in  Table  5.3.  Those  unidirectional  communication  paths  from,  say,  processor  20  to 
processors  {0,  1,  2,  8,  9,  16,  17,  18,  24,  25,  26,  27}  are  discarded. 

However,  for  the  reconfiguration  monitor  to  obtain  the  global  information  of  utilization 
of  the  surviving  system,  a  better  way  to  implement  the  deadlock-free  reconfiguration  algo¬ 
rithm  is  described  as  follows.  The  work  of  reconfiguring  the  surviving  system  into  smaller 
subsystems  is  essentially  to  group  processors  in  the  surviving  system  into  maximal  disjoint 
sets  such  that  in  each  set  the  DFA  property  (i.e.,  the  bidirectional  communication  capability) 
is  preserved.  Initially,  the  #(5^^)  processors  are  in  #(S^)  different  sets.  A  canonical  object, 
named  set[i],  is  chosen  to  serve  as  the  label  for  the  set  of  processor  z,  z  e  S^.  Since  there 
is  no  preference  for  the  choice  of  labels  as  long  as  they  are  canonical,  it  is  natural  that  the 
address  of  a  processor  is  sufficient  to  serve  as  the  label,  i.e.,  set[i]  =  i  initially.  A  reference 
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processor  nonzero  [i]  for  processor  i  contains  the  address  of  an  arbitrary  processor  j  such  that 
^  ^  ^  J-  The  reconfiguration  starts  randomly  from  any  processor  in  by 

searching  and  changing  the  label  of  another  processor  to  its  label  iff  there  exist  bidirectional 
communication  paths  between  them.  A  processor  k  is  included  in  a  disjoint  set  iff  the  label 
of  its  reference  processor  has  not  been  changed,  i.e.,  set  [nonzero  [k]]  =  nonzero  [k].  Eventu¬ 
ally,  all  the  labels  of  processors  in  a  maximal  disjoint  set  will  be  the  same  and  are  equal  to 
the  label  of  some  arbitrary  processor  in  the  maximal  disjoint  set.  The  following  is  the 
deadlock-free  reconfiguration  algorithm. 

Algorithm  A: 

[  Reconfiguration  of  The  Surviving  System  } 

procedure  Deadlock-Free  Reconfiguration  the  surviving  system;  [Z//]: 

boolean  array) 

Q  *r~  {  empty  queue  } 

N 

for  each  i  e  S.^  do 
enqueue  i  into  Q 
{  find  maximal  disjoint  sets  Cy’s  } 
while  N  ^  #(5^ )  or  Q  <j)  do 
j  first  (Q) 
k  <r-  nonzero  [J  ] 
if  set{k]  =  k  then 

for  each  /  such  that  [Zfsly,/  =  True  do 

Cj  <r-Cj  yj  {/} 

set[l]  <—  j 
N  <r-  N  +  it(Cj) 
dequeue  j  from  Q 

■Algorithm  4  scans  each  processor  in  (they  may  be  selected  randomly)  by  searching 
all  the  possibly  processors  for  a  maximal  disjoint  set  until  all  the  #(5^ )  processors  have  been 
classified  into  different  maximal  disjoint  sets.  To  distinguish  maximal  disjoint  sets  which 
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have  been  previously  found  from  the  maximal  disjoint  set  which  is  being  currently  con¬ 
structed,  we  need  only  check  whether  or  not  the  nonzero  parameter  of  a  processor  currently 
searched  has  been  changed  (based  on  the  fact  of  Theorem  2).  After  the  implement  of  Algo¬ 
rithm  4,  the  information  of  a  maximal  disjoint  set  is  sent  back  from  the  reconfiguration  moni¬ 
tor  to  each  processor  of  the  maximal  disjoint  set  Then,  the  shortest-path  routing  table  of  a 
processor  is  updated  to  a  deadlock-free  one  according  to  the  maximal  disjoint  set  to  which  the 
processor  belongs. 

{  updates  shortest-path  routing  tables  to  deadlock-free  ones  } 
procedure  Update  (C,;  maximal  disjoint  set;  j  e  C^:  processor) 
for  each  k  €  S^/C^  do 
^  0); 

After  all  the  shortest-path  routing  tables  (i.e.,  array  [A^])  has  been  modified,  a  number  of 
usable  links  and  switching  elements  were  implicitly  sacrificed.  These  components  construct 
those  unidirectional  communication  paths  which  were  discarded  as  array  [H^]  was  derived 
from  [//5].  Also,  a  number  of  subsystems  are  formed  from  the  original  surviving  system. 
Each  maximal  disjoint  set  corresponds  to  such  a  subsystem. 

For  example,  by  Algorithm  4,  the  surviving  system  in  Fig.  5.5  is  panitioned  into  two 
subsystems  with  DFA  property.  These  two  subsystems  are  {0,  1,  2,  8,  9,  16,  17,  18,  24,  25, 
26,  27}  and  {4,  5,  20,  21,  23,  28,  29),  respectively.  The  partitioning  results  are  shown  in  Fig. 
5.6.  Implicitly,  those  usable  switching  elements  {(1,8),  (1,10),  (1,15),  (2,13),  (3,4),  (3,8), 
(3,14)}  and  some  usable  links  connected  to  them  are  sacrificed. 


Fig.  5.6.  Two  subsystems  are  formed  from  the  surviving  system  in  Fig.  5.5. 
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5.4.  COMPLEXITY  OF  THE  FAULT-TOLERANT  RECONFIGURATION 
SCHEME 

The  time  complexity  of  the  proposed  fault- tolerant  reconfiguration  scheme  is  analyzed  as 
follows.  Assume  that  the  time  overhead  spent  on  data  communication  between 
reconfiguration  monitor  and  processors  is  negligible. 

(1)  Algorithm  1  (the  Accessibility)  takes  a  time  in  O  (A^(logA^)^). 

(2)  The  manipulation  of  a  variety  of  arrays:  [C*],  [C^],  [Q],  [Z/^]  and  [Z/^],  takes  a  time 
in  O(N-). 

(3)  Algorithm  3  (the  breadth-first-search)  takes  a  time  in  0{N^).  (We  will  explain  this 
later.) 

(4)  Algorithm  4  (the  Deadlock-Free  Reconfiguration)  takes  a  time  in  O  (N). 

(5)  Updating  the  shortest-path  routing  table  for  a  processor  takes  a  time  in  0{N). 

Therefore,  the  time  complexity  is  dominated  by  the  time  spent  on  the  manipulation  of  a 
variety  of  arrays  and  Algorithm  3  which  are  in  0{N^).  The  following  gives  the  detailed 
proof  of  the  time  complexity  of  Algorithm  3  which  is  a  breadth-first-search  on  a  directed  mul¬ 
tigraph. 

Algorithm  3  searches  all  the  accessible  processors  for  a  surviving  processor  by  a  minimal 
number  of  passes  through  the  faulty  network.  It  is  obvious  that  the  complexity  is  equal  to  the 
nimber  of  accessible  processors  plus  the  total  number  of  single-pass  communication  paths 
traversed  (i.e.,  the  total  number  of  arcs  traversed  on  G)  to  search  these  accessible  processors. 
The  number  of  accessible  processors  for  a  surviving  processor  is  at  most  N  which  is  straight¬ 
forward.  However,  to  calculate  the  number  of  required  single-pass  communication  through 
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the  faulty  network  needs  some  sophisticated  efforts.  The  difficulty  arises  from  that  the  varia¬ 
tion  of  the  number  of  single-pass  communication  paths  traversed  is  closely  related  to  the  dis¬ 
tribution  of  faulty  components  and  the  closed  form  of  their  relationship  is  hard  to  get.  Thus, 
an  approximate  method  might  be  used  to  find  an  upper  bound.  Assume  that  a  processor  can 
access  ail  the  N  processors  by  at  least  k+\  passes  through  the  faulty  network,  1  <  k  <N-  1. 
That  is,  by  Algorithm  2,  {N  -  j:i)  processors  are  searched  by  the  first  pass,  Uj  -  Xi)  proces¬ 
sors  are  searched  by  the  second  pass,  •  •  •  ,  -  Jt^)  processors  are  searched  by  the  kth 

pass,  and  processors  are  searched  by  the  (k-t-l)th  pass,  where  >  .t  j  >  .r2  >  .X3  >  •  •  •  > 
>  .Xfc  >0  and  all  .r,  ’s  are  integers.  For  the  first  pass,  the  number  of  single-pass  paths 
traversed  is  equal  to  (N  -  .ti).  For  the  second  pass,  not  all  the  (/V  -  Xi)  intermediate  proces¬ 
sors  which  have  been  searched  in  the  first  pass  can  access  all  the  (jCj  —  x -2)  processors.  In 
general,  part  of  these  (.Xi  -  X2)  processors  are  searched  by  routing  through  part  of  those  {.V  - 
Xi.)  intermediate  processors  and  another  part,  by  some  pan  of  others.  Because  of  the  sym- 
nietric  structure  of  an  Omega  network,  we  may  think  that  those  .Xj  processors  which  cannot 
be  searched  by  the  first  pass  are  uniformly  distributed  over  the  (N  -  .Xj)  intermediate  proces¬ 
sors.  Hence,  a  reasonable  estimate  of  the  portion  of  the  (/V  -  Xj)  processors  through  which 
the  U]  -  .X2)  processors  can  be  searched  by  the  second  pass  is  (N  -  .Xj)/A/.  Therefore,  an 
estimated  upper  bound  of  the  number  of  single-pass  paths  traversed  by  the  second  pass  is 

N  ~  x^ 

( - - — )-(A^  -Jti)-Cx,  -  .X2). 

N 

Similarly,  an  estimated  upper  bound  of  the  number  of  single-pass  paths  traversed  by  the  z  th 
pass,  \  <  i  <  k  is 

^<-2 
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and  the  estimated  upper  bound  of  the  number  of  single-pass  paths  traversed  by  the  (/:-t-l)th 
pass  is 

■^k-\  -Xk 
-^k-l 

Now  the  complexity  analysis  of  Algorithm  3  can  be  modeled  as  the  following  problem. 
For  an  arbitrary  distribution  of  a  fault  set,  there  exists  a  number  k,l<k<N  -  I,  such  that 
by  Algorithm  3  a  surviving  processor  can  search  all  the  accessible  processors  (either  all  or  a 
pan  of  the  N  processors)  by  /:  +  !  passes  through  the  faulty  network.  Let  the  number  of 
accessible  processors  be  M  <  N  and  E  denote  the  total  number  of  single-pass  communication 
paths  traversed  by  Algorithm  3.  It  can  be  shown  from  the  above  argument  that  E  is  bounded 
by  the  following  equations: 

M  -xi 

E  <  M  -  Xi  +  ( - — — )-(M  -  .xi)-(.r  1  -  .r-) 

iVi 

.ti  -  .V2 

( - )<X  1  -  X2)-{X2  -  X^) 

^  1 


Xk-l  -  Xk 

+  . . .  +  ( — ^ -x^yixo 
Xk-l 

=  /(Xi,  X2,  ...,Xk) 

=  fix). 


where 


/W  >  .ti  >  X2  >  .r3  >  •  ■  •  .r^_i  >  .tfc  >  0, 

.r, ,  for  all  I  <  1  <  /:,  are  integers. 

Thus,  the  analysis  of  complexity  of  Algorithm  3  has  become  the  constrainted  optimization 
problem  where  we  want  to  find  the  maximum  of  a  set  of  nonlinear  functions  with  inequality 
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constraints.  In  general,  the  nonlinear  optimization  technique  [WiChTS]  may  be  employed  to 
find  the  maximal  value  among  all  the  possible  functions  /(x)  with  inequality  constraints 
which  will  be  the  upper  bound  of  E .  That  is,  we  have  an  optimization  problem  whose  solu¬ 
tion  gives  the  upper  bound  of  E :  for  each  1  <  jk  <  -  1,  find 

max  /  (x) 

X 

such  that 

gj(x)  <0,  i  =  1,2,  .  .  .  ,k,  k+\ 

where 

/  =  1 

g,  (x)  ='  X,-  -  x,_i,  2  <  /■  <k. 

-H,  ‘=*+1 

» 

However,  a  simpler  way  to  obtain  the  upper  bound  is  discussed  as  follows. 

It  is  obvious  that 

/i(x)  =  .W  -  Xj -t- (M  -  xi)-(xi  -  X2)  +  (x,  -  .X2)  (X2  -  .1:3)  +  ■  • 

>/(x). 

To  find  the  maximum  among  functions  h{x),  a  geometrical  method  is  used  here.  See  Fig.  5.7. 
It  is  easy  to  show  that  the  total  area  of  those  shadowed  rectangles  is  equal  to  h  (x).  For  any  k 
and  any  arrangement  of  the  values  of  x^’s,  this  total  area  is  always  less  or  equal  to  .Vf^/2. 

>  max  h  (x)  >  max  /  (x). 

2  X  X 


That  is. 
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Fig.  5.7.  The  diagram  used  to  explain  the  complexity  of  Algorithm  3. 
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And  the  maximal  value  of  A/  is  A'  which  means  that  the  set  of  accessible  processors  of  a  pro¬ 
cessor  by  multiple  passes  through  the  faulty  network  is  all  the  N  processors.  Therefore,  the 
upper  bound  of  E  is  in  OiN^).  Obviously,  the  above  argument  gives  the  pro«if  that  Algo¬ 
rithm  3  takes  a  time  in  O  (N^). 

5.5.  SUMiMARY 

What  we  have  presented  in  this  chapter  is  a  flexible  and  real-time  reconfiguration  scheme 
for  a  multiprocessor  system  with  a  faulty  network.  Even  though  our  fault-tolerant 
reconfiguration  scheme  is  developed  for  an  A -processor  system  interconnected  by  a  log2A'- 
stage  Omega  network,  it  can  be  easily  extended  to  a  system  interconnected  by  other  networks 
which  are  topologically  equivalent  to  the  Omega  network  [Agr83]  and  are  constructed  by 
switching  elements  with  different  size.  Moreover,  our  scheme  can  be  used  on  a  system  inter¬ 
connected  by  a  ^ -stage  network,  k  >  n,  as  long  as  the  routing  scheme  on  this  network  is 
known.  The  principle  of  the  reconfiguration  of  a  system  is  conceptually  to  eliminate  faulty 
components  and,  if  necessary,  sacrifice  some  usable  components  implicitly  without  knowing 
the  actual  locations  of  these  components.  A  deadlock-free  environment  is  provided  for  the 
reconfigured  system  such  that  the  performance  of  the  system  is  gracefully  degraded. 
Deadlock-free  shonest-path  routing  tables  are  obtained  for  processors  in  the  surviving  system 
to  avoid  possible  deadlock  traps  which  may  be  caused  by  the  unidirectional  communication 
rather  than  bidirectional  communication  between  some  processors.  Because  of  the  bit-oriented 
routing  property  of  the  Omega  network  [Law75],  by  generating  the  destination  tag  and  refer¬ 
ring  to  the  routing  table,  a  data  packet  from  a  source  processor  can  always  be  routed  through 
a  proper  intermediate  processor  during  each  pass  through  the  faulty  Omega  network.  Since 
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the  routing  table  provides  information  of  multiple  communication  paths,  a  load-balancing 
scheme  may  also  be  employed  to  reduce  traffic  contention. 
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